PL a Е ift 
i ius 


usps 
Us 
an h, y : Т 


ў fur Ads 


ATO ое т. 
i m “| 9 y 


Tests & Measurements 
in 


Physical Education 


by 
JOHN F. BOVARD 


Emeritus Professor of Physical Education, 
University of California, Los Angeles 


FREDERICK W. COZENS 


Professor and Director of Physical Education 
University of California, Berkeley 


and 
E. PATRICIA HAGMAN 


Assistant Professor of Health ang Physical Education, 
Teachers College, Columbia (jj tersily. 


Third Edition 


p 


PHILADELPHIA AND LONDON 


W. B. SAUNDERS COMPANY 
1949 


Copyright, 1930 and 1938, by W. B. Saunders Company 
Copyright, 1949, by W. B. Saunders Company 


COPYRIGHT UNDER THE INTERNATIONAL COPYRIGHT UNION 


All Rights Reserved 
This book is protected by copyright. No part of it 
may be duplicated or reproduced in any manner 
without written permission from the publisher 


Reprinted August, 1949 


Bureau’Edni.Psy. Research 
DAVID HARE TRAINING COLLEGE 

Dated......... .... 
Accs. No JAS... 


с] 
9 
EE. > 
MADE IN U.S. A. 
SS 
PRESS OF \ 
W. B. SAUNDERS COMPANY 


PHILADELPHIA 


PREFACE 
TO THE THIRD EDITION 


Students of physical education face a challenging situation. No 
longer are we willing to allow our work to be characterized as just 
"exercise." We claim that physical education is an important 
subject like anv other in the curriculum; that education of and 
through the body represents an essential phase of the educative 
Process. Physical education, then, is confronted with the problem 
of discovering the laws, the principles and the fundamentals on 
which to build such a science. To do this we must adopt a research 
attitude, experiment under proper conditions with the best possible 
scientific equipment, and make full use of measurement in our 
day-to-day teaching. 

This book aims to present the field of measurement in physical 
education as it has come down to us. Its purpose is to assist the 
student and teacher to understand the place and importance of 
measurement in the teaching process, and to be familiar with the 
tools of scientific measurement now available in physical education. 
It is further hoped that this text will serve as background for those 
among us who will be active in the continued research which is 
necessary fo progress in our field. 

The authors have chosen deliberately to keep the scope of this 
subject within the range of the student in teacher education schools 
and others who may never find the opportunity to enter into any 
Specific investigation, but who, nevertheless, want to know what is 
being done and how to interpret and use intelligently what more 
mathematically minded researchers may discover. It has been our 
aim to keep all statistical formulae and methods as simple as possible 
and still present the essentials necessary to construct a test. 


iii 


IV Preface to the Third Edition 


The book presents also a number of references to commonly used 
tables and tests with the hope that the volume will become a handy 
compendium of useful information to student and instructor alike. 


USE OF THE TEXT 


In order to have an understanding of much of our present-day 
literature, it is not only important but essential that the student and 
teacher be familiar with elementary statistical procedures. Certain 
groups of students seem to have developed a fear complex when 
confronted with simple statistical techniques and tabular material, 
and to these it should be made clear that there is nothing mysterious 
connected with the solution of problems involving such procedures. 
To the uninitiated, however, certain seeming complications may 
be eliminated by the logical presentation of material and for that 
reason an explanation of the use of the text seems desirable. 

The student will note that the text is divided into three parts: 
The Status of Measurement, The Tools of Measurement and The 
Theory and Practice of Test Administration. It is unthinkable 
that all of this material can be covered in a first course in Physical 
Education Tests and Measurements. Rather, the text is designed to 
cover a much broader range of knowledge. Part III, tor example, 
cannot possibly be used in an elementary course and should only 
be attempted by graduate students thoroughly familiar with ele- 
mentary statistical procedures. Little, if any, of the material in 
Chapter XIV can be undertaken in a first course unless the student 
has previously had educational statistics. The more advanced 
procedures are included in the text for the sake of offering a complete 
unit in one volume and should only be attempted after the student 
gains an understanding of statistical terms and a familiarity with 
the simple techniques. 

The teacher of the professional course in Physical Education Tests 
and Measurements must supplement the brief explanations of sta- 
tistical procedures and should have at hand at least one good text : 
on educational statistics, since it is not within the scope of this text 
to present detailed material to which the average text on statistics 
devotes two hundred or more pages. 


"There are at least three methods of offering a course in Physicat 
Education Tests and Measurements: 


1. Set up a course in educational statistics as a prerequisite. This 
scheme will permit a larger amount of time to be devoted to the 


Preface to the Third Edition у 


history of physical education tests, to the application of statistical 
procedures in the more recent contributions and to the solution of 
problems involved in the presentation of material in the chapters on 
Elementary Graphical Methods and Methods of Scoring Tests. 

2. First familiarize the student with the elementary statistical 
procedures outlined in Chapter XIII and later take up the history 
of measurement in physical education pointing out the statistical 
procedures which have been used in constructing the various tests, 
with particular emphasis on procedures used in recent test material. 
It is essential that a large number of problems involving the use of 
simple procedures be solved in order that facility may be gained. 
Particular attention should be given to problems involving the use 
of the normal probability table (Chapter XV), and to the various 
ways of scaling tests and setting up scoring tables (Chapter XVI). 

3. First acquire a background of the measurement movement in 
physical education and then introduce the student to statistical 
procedures and the solution of simple problems. In choosing this 
method it is important that time enough be allowed for the instructor 
to review in detail the statistical procedures used in an adequate 
sampling of the more recent test contributions. The student will not 
readily make the application unless a variety of problems is offered. 

Since each of these three methods has been tried with success, there 
appears to be no necessity for a particular recommendation. How- 
ever, the authors wish to emphasize strongly the advisability of 
presenting problem-solving situations. Only by the application of 
statistical procedures can proper insight be gained. , 


TEXT REVISION 


The revision of text material for the third edition has been fairly 
€xtensive, since a great deal of new material has come into the field 
since 1958. Chapter I, The Need for and Use of Measurement, has 
been almost completely revised. Because of the increasing interest 
In evaluation in both educational literature and practice, this 
Process has been defined, and the importance of measurement 
thereto indicated. Chapter II, Development of Measurement in 
Physical Education: Brief Historical Sketch, has been condensed. 
Material on the early development of physical education measure- 
ment has been left intact, but recent developments have been sum- 
Marized in terms of the general status of the various types of tests 
` and measurements currently used. The remainder of the material 


VI Preface to the Third Edition 


of Part I, The Status of Measurement in Physical Education, has been 
generally reorganized, based upon a classification of tests according 
to purpose. This organization has resulted in the elimination or 
revision of several chapter titles, and the addition of five new 
chapters. 

New chapter titles include: Athletic Achievement Tests and 
Scoring Scales (Chapter V), The Measurement of General Qualities — 
Strength and Power (Chapter VII), The Measurement of General 
Qualities (Continued) — Motor Ability, Capacity, Educability ; Neuro- 
muscular Control (Chapter VIII), Physical Fitness and Motor Fitness 
Tests (Chapter IX), and Rating Scales (Chapter XII). Material in 
the latter two chapters is almost entirely new. Chapter IX, Physical 
Fitness and Motor Fitness Tests, includes many of the tests developed 
as a result of emphasis on physical fitness during World War II. 
Chapter XII, Rating Scales, discusses the proper use of rating scales 
in physical education measurement, as well as available rating 
techniques and devices. 

The presentation of the various phases of measurement in physical 
education has been brought up to date throughout, and a great deal 
of the tabular material dealing with obsolete tests has been deleted, 
particularly in the Appendix. The selected annotated references at 
the end of each chapter have been almost completely revised. These 
references include later material of a supplementary nature from 
both general education and physical education. Some of the refer- 
ences deal with recent research related to specific tests discussed in 
the text. 

Parts II and III are substantially the same as in the second 
edition. Minor changes have been made, however, in some data to 
make it more timely, footnotes have been revised in line with later 
editions, and in the selected references older texts have been replaced 
with references to material more readily available to both student 
and teacher. 

In addition material on general policies for organizing testing 
programs has been added to Chapter XIX. Program Organization 
and the Technique of Test Administration. 


Preface to the Third Edition VII 
ACKNOWLEDGMENTS 


We desire to express our thanks to numerous friends for sugges- 
tions and particularly to Dr. Harl Douglass of the University of 
Colorado and Dr. C. L. Huffaker of the University of Oregon. In 
the text we have given credit for the tables and tests as they are 
specifically mentioned and our thanks for permission to use them is 
hereby gratefully recorded. 

The generous privilege to use quotations and tabulations was 
given by A. S. Barnes and Company; D. C. Heath and Company; 
Lea & Febiger; Charles Scribner's Sons; W. B. Saunders Company; 
Bureau of Publications, Teachers College, Columbia University, 
and American Council on Education. 

Much of the experimental work in physical education has 
appeared in journals and periodicals, and to the editors and publish- 
ers we are glad to acknowledge the assistance they gave. Particular 
mention should be made of the American Physical Education Re- 
view; Research Quarterly of the American Association for Health, 
Physical Education, and Recreation; Journal of Health and Physical 
Education; Medicine; Modern Medicine Library; Journal of the 
American Medical Association; Archives of Internal Medicine; 
Mind and Body; Medical Journal and Record; Journal of Applied 
Psychology. 

Numerous educational school boards have contributed valuable 
tests and standards of measurement. We mention particularly the 
California State Board of Education, Board of Education at Detroit, 
School District for the City of Los Angeles, Playground and Recre- 
ation Association of America, American Child Health Association, 
University of Illinois Bulletins, University of Oregon Press, Cam- 
bridge University Press. 

For figures and illustrations we are grateful to Charles Scribner's 
Sons for the chart used by Sargent, to Professor Frank Kleeberger 
of the University of California for the pictures of the photographic 
measuring apparatus and^the chart for scoring physical education 
measurements, and to Professor W. R. LaPorte of the University 
of Southern California for the silhouettes used in the diagnosis of 
posture. 

The constant reference to the statistical works of Dr. H. E. 
Garrett of Columbia University, Dr. Karl J. Holzinger of the Uni- 
versity of Chicago, and Dr. T. L. Kelley of Harvard University is a 
debt which should receive mention beyond mere footnote citation. 


VIII Preface to the Third Edition 


Appreciation must be recorded for adaptations made from the text 
of Drs. G. M. Ruch and G. D. Stoddard on Tests and Measurements 
in High Schoot Instruction and from the text of Dr. W. A. McCall 
on Measurement, a Revision of How to Measure in Eaucation. 

We are deeply appreciative of the assistance given by all of these. 
Though we have drawn heavily from many sources, our one thought 
has been the advancement of the cause of physical education. 


Jonn F. Bovarp 
FREDERICK W. Cozens 
E. PATRICIA HAGMAN 


Pap = 


TABLE OF 


CONTENTS 


PART I. THE STATUS OF MEASUREMENT 
IN PHYSICAL EDUCATION 


Chapter I 
Tue NEED FOR AND USE OF MEASUREMENT... cscs ee nnn 5 


Measurement and Evaluation Defined, 5.—ТҺе Importance 
and Use of Evaluative Procedures, 6.—Appraisal of Pupil Progress, 
7.—Diagnosis and Guidance, 8.—Classification, 9.—Motivation, 10.— 
Instructional Methodology, 12.—A ppraisal of Instructors, Methods and 
Materials, 12.—Research, 13.— The Need for Measurement in Physi- 
cal Education, 14. 


Chapter 11 


THE DEVELOPMENT OF MEASUREMENT IN РнүѕІСАІ EDUCATION: BRIEF 
EIISTORICAL SKETCH he O SS N OAS Азр Те 17 


(A) Early Development of Physical Education Measurement, 17. 
— Development of Anthropometry, 17.—Development of Strength Tests, 
21.—Development of Cafdiac Functional Tests, 25.—Development of 
Physical Ability Tests, 25.—Development of Indices for Measuring 
Physical Efficiency, 30.—(B) Modern Developments in Physical 
Education Measurement, 51.—Anthropometric Measurement, 52.— 
Cardiovascular Tests, 33.—Athletic Achievement Tests and Scoring 
Scales, 35.— The Classification of Pupils, 534.—The Measurement or 
General Qualities, 35.—Physical Fitness and Motor Fitness Tests, 36. 
— Sport Technique Tests, 37—Knowledge and Information Tests, 57.— 
Rating Scales, 38. 


X Table of Contents 
Chapter III 


ANTHROPOMETRIC MEASUREMENTS 


Hitchcock's Contribution, 39.—Sargent's Contribution—An Anthropo- 
metric Chart, 40.—Special Instruments for Measuring Postural Condi- 
tions, 45.—The Brownell Scale for Measuring Anteroposterior Posture, 
46.—Postural Measurement of the Pre-school Child, 46.—Korb's 
Comparograph, 47.—MacEwan-Howe Posture Measurement, 47.— 
Springfield Postural Measurements, 48.—Wickens and Kiphuth Posture 
Measurement, 49.—Use of Su bjective Ratings, 49.—Foot Measurement, 
50.—Anthropometric Measurements for Determining Nutritional 
Status, 51.—The ACH: Index of Nutritional Status, 52.—The Wetzel 
Grid Technique, 53.—Pryor's Width-Weight Tables, 54.—Indices of 
Stature and Build, 54.—Studies Involving Body Build or Type, 56.— 
Sheldon: Somatotypes, 58. 


Chapter IV 


CARDIAC FUNCTIONAL Tests 


Principles involved in Cardiovascular Tests, 62.—Crampton’s “Blood 
Ptosis" Test, 64.—McCurdy’s Condition Test, 65.—Meylan's Test, 65. 
—Foster’s Test, 66.—The Barach Test, 67.—The Barringer Test, 68.— 
The Schneider Test, 69.—The U-Tube Manometer Test, 71.—The Pulse 
Ratio Test, 73.—The Harvard Step Test, 76.—Pack Test of Exercise 
Tolerance, 77.—McCloy’s Test of Present Condition, 78.—McCurdy- 
Larson Test of Organic Efficiency, 78.—Tests of Circulatory Fitness, 
79.—Group Physical Condition Tests for Boys and Girls, 80.—Typical 
Comments on Cardiac Functional Tests, 83. 


Chapler V 
ATHLETIC ACHIEVEMENT TESTS AND SCORING SCALES 


Elementary and Secondary Schools, 90.—The Athletic Bad 
90.—Richards’ Efficiency Tests for Grade Schools, 91.—Philadelphia 
Public School Age Aim Charts, 91.—Reilly’s Scheme—Rational Athletics 
for Boys and Girls, 92.—California Decathlon, 93.—Detroit Decathlon 
for Boys, 95.—Los Angeles Achievement Expectancy Tables, 94.—The 
Motor Ability Tests, 94.—Rogers' Athletic Index, 95.— Bliss Study of 
Progression, 96.—The Cleveland Physical Ability Test for Boys, 96.— 
Jenkins Motor Achievements of Children, 96.—Fundamentals of Motor 
Performance for Secondary School Girls, 97.—Colleges and Universi- 
ties, 98.—Meylan's Test for Grading in Physical Education, 98.—Sigma 
Delta Psi, 99.—Schuettner's Scheme for Stimulating Interest, 99,— 
Metcalf's Standards Proposed to the College Directors, 100.—National 


ge Tests, 


Table of Contents XI 


Amateur Athletic Federation Physical Efficiency Standards, 101.— 
Oberlin College Test, 101.—United States Military Academy Physical 
Efficiency Test, 102.—Achievement Tests in Activities for Physical 
Education "Teachers in Training, 102.— The University of Illinois Plan, 
105.—Scoring Scales and Standards, 104.—McCloy’s Scoring Tables, 
104.—The California Achievement Scales, 104.—The Junior Pen- 
tathlon Program, 105.—Achievement Scales for Boys and Girls in 
Elementary and Junior High Schools, 105.—Achievement Scales for 
Boys in Secondary Schools, 106.—Achievement Scales in Physical Edu- 
cation Activities for College Men, 107.—Achievement Scales in Physical 
Education Activities for Secondary School Girls and College Women, 
108.—Physical Fitness Pentathlon, 108.—National Standards of 
Achievement for Girls and Boys, 108.—Standards for Boys, 109.— 
Standards for Girls, 109.—Physical Performance Levels for High School 
Girls, 110.—Achievement Scales in Motor Fitness Events, 111. 


Chapter VI 
INDICES FOR THE CLASSIFICATION OR GROUPING OF STUDENTS. ........ Less. 114 


History of the Problem, 114.—McCloy’s Studies in Athletic Handicap- 
ping, 116.—The California Studies, 117.—Grouping of College Men, 119. 
—Grouping of High School Girls, 119.—A Comparison of Age-Height- 
Weight Classification Indices, 120.—Other Indices for Use in Homo- 
geneous Grouping, 120.—Strength Indices as Classifiers, 121.—Motor 
Ability Indices as Cla 


sifiers, 121. 


Chapter VII 
Tue Measurement or GENERAL QuALITIES—STRENGTH AND POWER....... 124 


Strength Tests, 124.—Sargent's Test, 124.—Francis Galton's Test, 
126.—The Ergograph, 126.—Kellogg's Dynamometer, 126.—Martin's 
"Resistance Test", 127.—Rogers' Strength Index, 128.—Rogers' Short 
Strength Index, 129.—McCloy's Method of Scoring Chinning and 
Dipping, 129.— Prediction of Total Strength, 150.—Multiple Strength 
Indices of General Motor Ability, 151.—Weighted Strength Tests for 
High School Girls, 152.— Weighted Strength Tests for High School 
Boys, 132.—Weighted Strength Tests for Elementary School Children, 
135.—Weighted Strength Tests for College Women, 153.—Strength and 
General Athletic Ability efor College Men, 154.—Recent Strength 
Testing Research, 155.—Power Tests, 157.— The Physical Test of 
Man, 138.—Schwegler and Englehardt Variation of the Sargent Test, 
138.—The Leapmeter, 159.—Further Studies on the Sargent Jump 
Test, 139.—The MacCurdy Physical Capacity Test, 141. 


XII Table of Contents 
Chapter VIII 


Tue Measurement OF GENERAL QUALITIES (Continued)—Motor ABIL- 
ITY, CAPACITY, AND Epucasitity; NEUROMUSCULAR Сомтвог. ....... 144 


Tests of Motor Ability, Capacity, and Educability, 144.—Brace's 
Scale of Motor Ability Tests, 145.—The Iowa Revision of the Brace 
Scale of Motor Ability Tests, 147.—Sectioning Students into Homo- 
geneous Teaching Units, 147.—McCloy’s Test of General Motor Capac- 
ity, 149.—McCloy’s Test of General Motor Ability, 150.—General Motor 
Ability and Capacity Tests for the First Three Grades, 150.—Garfiel's 
Motor Ability Test for College Women, 151.—University of Oregon 
Motor Ability Test, 152.—The Minnesota Motor Ability Tests for 
College Women, 153.—The Humiston Motor Ability Test for College 
Women, 154.—Scott’s General Motor Ability Test for Girls and Women, 
155.—Motor Ability Test for High School Girls, 156.—A Measure of 
General Athletic Ability for the College Man, 156.—Motor Ability 
Tests for College Men, 158.—The Predictive Value of Selected Motor 
Ability Tests, 158.—Neuromuscular Control Tests, 159.—Tests of 
Capacity and Endurance, 159.—Tests of Coordination, 161. 


Chapter IX 
PuvsicAL FITNESS AND MOTOR Fitness U E о EES ee ee 167 7 


Physical Fitness and Motor Fitness Defined, 167.—Recent Fitness 
Measures, 171.—Physical Fitness Tests of the Armed Services, 171.— 
United States Office of. Education, 174.—University of Illinois Motor 
Fitness Tests, 175.—M tor Fitness Test for High School Girls, 175.— 
The California Physical Fitness Pentathlon, 176.—Indiana University 
Motor Fitness Indices for High School and College Age Men, 177.— 
Indiana Physical Fitness Test, 177.—The Iowa Physical Fitness Battery 
for College Women, 178.—Yale Motor and Physical Fitness Tests, 178.— 
City College of New York Program of Health and Physical Fitness Eval- 
uation, 179.—The Andover Physical Fitness Testing Program, 179.— 
Illinois High School Physical Condition Test and Standards of Perform- 
ance, 180.—Earlier Fitness Measures, 180.—Indices for Measuring 
Physical Efficiency, 180.—The Use of Vital Capacity, 182.—Sargent’s 
Test of Speed and Endurance, 183.—University of California Physical 
Efficiency Test, 183.—College Freshman Physical Efficiency Test, 184.— 
Physical Efficiency Test for Freshmen College Women, 184.—The 
Measurement of Organic and Neuromuscular Fitness in College Women, 
185. 


Table of Contents 


Chapter X 


SEORT TECHNIQUE TESTS cuc Ен Н eee T NE 


Fundamentals for Measuring Technique in Sport, 189.—Specific Ex- 
perimental Contributions, 192.—Archery, 195.—Hyde's Archery 
Scales, 195.—Camp Archery Association, 195.—Badminton, 194.— 
Baseball, 194.— Playground Baseball, 194.—Basketball, 195.—A Test 
of Ability and Progress in Basketball (Men), 195.—Evaluating Abilities 
of Basketball Players (Men), 195.— Basketball Progress Tests (Boys 
and Men), 196.—The Measurement of Ability in Women's Basketball, 
196.—A Basketball Test for College Women, 197.—A Basketball Motor 
Ability Test, 197.—Achievement Tests in Girls’ Basketball, 198.—Field 
Hockey, 198.—Field Hockey Achievement Tests for College Women, 
198.—Field Hockey Achievement Scales, 199.—Gymnastics, 199.— 
Football, 199.—Achievement of College Men in Touch Football, 199. 
—Achievement of College Men in Varsity Football, 200.—Ice Hockey, 
200.—Rhythm Tests, 200.—Rhythmic Capacity of Physical Education 
Majors, 200.—Practical Rhythm Tests for Physical Education Groups, 
201.—Measurement of Motor Response in Rhythm, 201.—Response to 
Auditory Rhythms, 202.—Predictive Measures of Ability to Learn 
Dance Movements, 202.—Soccer, 203.—Soccer Skill Tests (1), 205.— 
Soccer Skill Tests (2), 203.—Soccer Skill Test for the Fifth and Sixth 
Grade, 204.—Soccer Skill Tests for Ninth and Tenth Grade Girls, 204. 
—Achievement Scales in Soccer and Speedball, 204.—Speedball, 205.— 
Swimming, 205.—Cureton's Swimming Test for Beginners (Rotational 
Method), 206.—Cureton’s Intermediate and Advanced Tests, 207.— 
Test for Endurance in Speed Swimming, 208.—Speed Swimming Scales 
for Secondary School Girls and College Women, 208.—Achievement 
Scales in Wartime Swimming, 209.—Naval Aviation Swimming Stand- 
ards, 209.—Tennis, 210.—Essential Qualities for Tennis Players, 210. 
— Tests to Determine Progress in Tennis, 211.—The Dyer Back- 
board Test of Tennis Ability, 211.—Grading Beginners in Tennis, 211. 
—Table Tennis, 212.—Table Tennis Backboard Test, 212.—Track 
and Field, 212.—Scoring Tables for College Women, 213.—Percentile 
Scales for Boys, 215.—A Fall Decathlon for College Track Squads, 215. 
— Volleyball, 214.—Practice Tests in Volleyball Skills, 214.—Achieve- 
ment Scales in Volleyball Skills, 214.—Achievement Tests in Volleyball 
for High School Girls, 214.—University of Wisconsin Volleyball Skill 
Tests, 215.—Repeated Volleys Test, 215.—All-Around Athletic 
Performance, 215.—Program for College Men and Women, 215.— 
The New York State Program, 217. 


XIV Table of Content 


Chapler XI 
KNOWLEDGE AND INFORMATION TESTS...... ас» ese ыгыз e ка ба келк e 219 


(A) In Physical Education Activities, 220.—Badminton, 220.— 
Baseball (Playground), 221.— Basketball, 221.—An Appreciation Test in 
Dance, 221.—Field Hockey for Women, 222.—Golf, 222.—Ice Hockey, 
222.—Soccer, 223.—Soccer Rules, 223.—Swimming, 225.— Tennis, 223. 
— Information Tests in Health and Physical Education for High School 
Boys, 224.—The Minnesota Physical Education Knowledge Tests, 
225.—Knowledge Test on Source Material, 226.— National Officials" 
Rating Committee Tests, 226.—Knowledge Tests for Activities in the 
Major Curriculum, 226.—(B) In Health Knowledge, Habits, and 
Attitudes, 227.—Gates-Strang Health Knowledge Tests, 227.—Franzen 
Health Education Tests (American Child Health Association), 228.— 
Wood-Lerrigo Health Behavior Scale, 228.—Health Knowledge Test for 
Adults, 229.—Health Knowledge Test for College Freshmen, 229.— 
Health Knowledge Test for High School Seniors and College Freshmen, 
229.— First Aid Test, 250.—Diet and Dental Health Test, 230.—Health 
Practice Inventory, 230.—Health Attitude Scale, 230.—Health Educa- 
tion Test: Knowledge and Application, 230.—Health and Safety 
Education, 231.—Health Practices, Knowledge, Attitudes and Interests 
of Senior High School Pupils, 251.—Additional Tests in Health Educa- 
tion, 251. 


Chapter XII 


Construction and Use of Rating Scales, 255.—Activity Rating Scales, 
258.—The Rating of Player Performance in Basketball, 258.—Diving 
Rating Scales, 239.—Ratings in Riding Competition, 240.—Form 
Diagnosis Sheets, 240.—Some Sample Rating Forms, 240.—Self. Rating 
of Physical Fitness According to Definite Standards, 241.—Rating of 
Sports Officials (Women), 241.—Cureton's Multiple Rating Scales, 241. 
—Program Score Cards, 242.—Score Cards for Secondary School 
Physical Education Programs for Boys and Girls, 242.—Score Cards for 
Elementary and Secondary School Health and Physical Education, 
243.—A Check List for the Survey of Health and Physical Education 
Programs in Secondary Schools, 243.—Score Card for Y.M.C.A. Physical 
Education Programs, 243.—Score Card for Physical Education Programs 
for Physically Handicapped Children, 244.—Behavior Rating Scales, 
244.—McCloy’s Behavior Rating Scale, 244.—A Technique for Measur- 
ing Sportsmanship, 245.—Social Efficiency, 246. 


Table of Contents XV 


PART II. TOOLS OF MEASUREMENT; 
A BRIEF OUTLINE OF STATISTICAL METHODS 


Chapter XIII 
[SUSMENTARY STATISTICAL METHODS. eissii ee ае кзз! tos estes rece See TED] 


The Frequency Distribution, 252.—Selecting the Size of Each Interval, 
253.— Midpoint, 255.—Measures of Central Tendency, 262.— Measures 
of Variability, 267.—The Reliability of Various Measures, 272. 


Chapler XIV 
ELEMENTARY STATISTICAL METHODS (Сол{їпией)......................... 276 


Correlation, 276.—Regression, 281.—The Standard Error of Estimate, 
284.— Correlation Ratio, 285.—Partial and Multiple Correlation, 285. 
—The Computation of Classification Indices or Formulas, 288.—Solution 
of a Five-variable Problem, 290. 


Chapter XV. 


ELEMENTARY GRAPHICAL METHODS; SOME ‘Properties AND Uses OF THE 
NORMAL CURVE e аша circa alate М» э Керен Ealar оне Ws EIN Sen A 296 
Simple Algebraic Principles, 296.—Frequency Polygon, 297.—Histo- 
gram, 298.—Column or Bar Diagram, 299.—The Ogive Curve or Per- 
centile Graph, 300.—An Important Application of the Normal Curve, 
302.—Transmuting Judgment Scores into Scores оп a Linear Scale, 305. 


Chapter XVI 
METHODS OF SCORING TESTS: И O te ern sels shee te haere TID 509 


Pass or Fail, 310.—Success or Failure, 311.—Minimum Standards with 
Additional Point Awards for Better Performances, 512.—The Division 
into Classes or Groups, 313.—Standard Scores or Measures, 514.— 
The T-Score, 315.—Even-Step Interval, 317.—Increased Increment, 318. 


2 


XVI Table of Contents 


PART III. THEORY AND PRACTICE 
OF TEST ADMINISTRATION 


Chapter XVII 


CRITERIA For SELECTING TESTS 


Validity, 327.—Reliability, 329—Objectivity, 333.—Administrative 
Economy, 333.—The Use of Norms, 554.—Duplicate Forms, 336.— 
Standardized Directions, 557. 


Chapler XVIII 


Test CONSTRUCTION IN PHYSICAL ЕрРиСАТ1ОМ............................ 559 


The Determination of the Quality to be Measured, 339.—Setting 
Up Criteria for Establishing Validity, 5341.—Preliminary Try-Out 
of the Tests or The Assembly of a Trial Battery of Tests, 345.—A 
Typical Problem, 345.—Selection of Test Items, 346.—Preliminary 
Try-Out, 546.—Securing Reliability of Single Tests, 347.—Elimination 
of Tests, 547.—Securing an Adequate Criterion Score, 348.—Experi- 
mental Conditions, 349.—Scoring, 349.—Selection of the Final 
Batteries, 549.—Combining the Tests, 350.—Multiple Correlation of 
the Battery with the Criterion, 354.—Standard Deviation of the Battery, 
356.—The Determination of Battery Reliability, 557—Other Means 
of Computing Battery Reliability, 358.—The Prediction of Physical 
Skill from a Test Score, 359.—Establishing Physical Activity Age Norms, 
562.—Preparation of a Manual for Use in Administering the Test 
Battery, 564.—Purpose of the Test, 364.—The Basis for the Construc- 
tion of the Test, 564.—A Description of the Test, 364.—Method of 
Validation, 564.—Reliability of the Test, 364.—Standardization and 
Directions for Scoring, 364.—Use of the Results, 364. 


Chapter XIX 
PROGRAM ORGANIZATION AND THE TECHNIQUE OF TEST ADMINISTRATION. ... . 566 


Organization of the Testing Program, 566.—The Technique of 
Test Administration, 369.—Preliminary Arrangements, 370.— 
Directions for Giving the Tests, 571.—Guiding Principles in the Prepa- 
ration of Instructions, 575.— Preparation for the Testing, 374.—Organi- 
zation of Examining and Recording Assistants, 575.—Scoring the 
Results, 379.—Section Assignments, 379. 


Table of Contents XVII 


Chapter XX 


Diacnosts 382 


Purpose of Diagnostic Tests, 582.—Requirements for Diagnostic Tests, 
584.—Individual Classification, 385.—Group Classification, 587.— 
Predicting Athletic Success, 388.—Statistical Procedures in Diagnosis— 
Individual Items versus Group Averages, 588. 


FNS) Sn О, Жы d Sin м 591 


Areas of the Normal Curve, 392.—The Transmutation of an Order of 
Merit into Units of Amount or "Scores", 395.—Table for Transmuting 
an Order of Merit into Unit Scores on the Basis of 10 or 100 (Range — бо), 
594.— Table of T-Scores, 595. 

Physical Growth Record for Boys, 4-11 Years of Age, 596.—Physical 
Growth Record for Boys, 11-18 Years of Age, 397.—Physical Growth 
Record for Girls, 4-11 Years of Age, 598.—Physical Growth Record for 
Girls, 11-18 Years of Age, 399. à 


PART I 
The 


Status of Measurement 


in 


Physical Education 


CHAPTER I 


The Need for and Use of Measurement 


Philosophers ‘as well as scientists have long since concluded that 
understanding a situation is essential to its effective control. In all 
aspects of life, social, political, economic and educational, ever 
increasing emphasis is directed toward research, analysis, and other 
means to increase insight. Progress in all major areas of education, 
administrative, supervisory and instructional, has been inevitably 
influenced by the increasing application of scientific methodology 
to educational problems. A compendium of modern education 
could well be written in terms of the advancement of educational 
measurement. The ever expanding interest in all forms of inquiry 
in education, and the widening recognition of the need to understand 
the true values involved, have given rise to common usage of the 
term evaluation. Few treatments of current educational problems 
omit concern with the evaluative process. 


Measurement and Evaluation Defined 


Considerable confusion exists because of the evolving use of the 
term evaluation. Is evaluation synonymous with measurement, or 
has it replaced the term in educational parlance? A brief definition 
of the two terms will provide needed clarification. 

Evaluation may be defined as the process of appraising the effec- 
tiveness of the attainment of educational goals. There are several 
varieties of educational evaluation. Troyer and Pace summarize 
these briefly as follows: “There is evaluation of individual programs 
of single courses, of total programs or major parts thereof. There 

3 


4 The Status of Measurement in Physical Education 


is evaluation in regular or normal settings and in experimental 
settings. There is evaluation as an ongoing continuous activity, and 
evaluation that is periodic. And there is self-evaluation and evalu- 
ation by others." ! 

The process of evaluation involves three steps. The first step is to 
define and appraise objectives. Evaluation presupposes an under- 
standing of the specific goals or objectives of a given educational 
experience, a basic principle of evaluation being: evaluation is done 
in terms of the objectives sought. It further presupposes worth of the 
objectives. Merely to have objectives is not enough; the worth of 
the objective must first be assured. The second step is to collect data. 
The process of evaluation utilizes all procedures, both quantitative 
and qualitative, which may be used to collect data necessary to 
appraise the extent to which the educational objectives have been 
achieved. The third step is to judge the educational significance, in 
light of the objectives sought, of the information and data collected. 

Measurement, then, can be defined in terms of the second step in 
the evaluative process . . . evaluative procedures for the collec- 
tion of data. Measurement refers to those evaluative procedures 
which are precise, objective, quantitative and whose findings are 
capable of statistical treatment. Measurement characteristically 
indicates its findings in numerical form. The scores or results of 
measurement are not in themselves significant or self-explanatory. 
They become significant only after appraisal and interpretation in 
light of all available data. Measurement determines status. The 
evaluative process judges the educational significance of that status. 

Measurement utilizes standardized tests, achievement scales, 
rating scales, score cards and other measuring instruments, the use 
of which results in findings recordable in terms of time, dist 
number, amount or quantitative symbol. 
procedures in contrast to measurement, d 
evaluative procedures, include simple observations and ratings, logs, 
interviews, case histories, anecdotal records, self-appraisals, check 
lists, and other devices through which subjective judgments are 
made, or observations recorded. Measurement provides objective, 
precise information about certain qualities which lend themselves 
to such analysis. Other evaluative procedures aim to provide 


1Troyer, Maurice E. and Pace. C. Robert, Evaluation in Teacher Educati 
1 ce, C. tducation, p. 6. 
Washington, D. C., Commission on Teacher Education. American Council on 
Education, 1944. 


ance, 
Qualitative evaluative 
efined as quantitative 


The Need for and Use of Measurement 5 


information about facets of human behavior or social organization 
which do not readily lend themselves to objective analysis. 

The need to utilize objective evaluative techniques whenever 
possible is emphasized by Traxler who says: 

Measurement should be the preferred means of evaluation in those 
areas to which it is clearly applicable. Measurement is constantly 
increasing its domain, is forever pushing outward toward the perim- 
eter of the field covered by evaluation, but since there are certain 
areas in which techniques of measurement still leave much to be 
desired, other techniques of evaluation must be utilized in order to 
complete the information concerning an individual student and to 
provide the well-integrated picture that is needed . . . . Thus con- 
ceived the two terms cannot possibly be in conflict as they are some- 
times thought to be.? 


Ап example from the field of physical education will further clarify 
the terms defined. To adhere to the basic principle of evaluation, 
that it must be done in terms of the objectives sought, the first step 
requires a brief look to objectives. Educational objectives are derived 
from three sources: the societal structure, the developmental needs 
and status of pupils, and the social function of the school.) A 
soundly structured program of physical education will have (1) a 
clearly stated aim, compatible with existing educational philos- 
ophy, (2) a set of clearly stated objectives representing concrete 
goals in light of the aim, (3) a planned program of activities selected 
in light of objectives, with specific objectives for each activity in the 
Program, and (4) established means for evaluating the extent to 
which objectives are achieved. 4 

Among widely accepted objectives for physical education are 
included the development of neuromuscular skills, physical fitness 
and social efficiency. In light of these general objectives, the devel- 
opment of selected game skills, and effective team membership 
might be two of many acceptable objectives for a basketball class 
for Secondary school boys or girls. In the process of evaluating the 
extent to which a pupil or.class achieves these two objectives, both 

?Traxler, Arthur E., “Individual Evaluation," p. 16, New Directions for Measure- 
ment and Guidance. American Council on Education Studies. Series I — Reports 

of Committees and Conferences — Number 20. Washington, D. C., Vol. VIII, 


August, 1944. 

3Barr, A. S., Burton, William Н. and Brueckner, Leo J., Supervision, p. 183, 
New York, D. Appleton-Century Company, Inc., 1947. 

‘Williams, Jesse Feiring and Brownell, Clifford Lee, Zhe Administration of Health 
and Physical Education, p. 375. Philadelphia, W. B. Saunders Company, 1946. 


6 The Status of Measurement in Physical Education 


quantitative and qualitative appraisals must be applied. Later in 
this text the complexity of measuring even game skills on a purely 
quantitative level will be revealed. Considerable progress has been 
made in this direction, however, and tests are available to measure 
knowledge of rules, ability in several of the game skills and so forth. 
No comparable tests exist to measure emotional control under 
competitive situations, ability to cooperate with teammates, sports- 
manlike attitude and conduct toward opponents — all equally 
important outcomes of the teaching situation. Measurement has 
pervaded this area by the development of rating scales of various 
kinds, but in addition, the use of teacher observations, check lists, 
anecdotal records, self-ratings, and other qualitative appraisals 
become essential to full understanding of the educational growth 
involved. 

Obviously, the evaluation of the outcomes of physical education 
cannot be fully measured by existing statistical tests and measure- 
ments in the field. However, only through a complete and thorough 
understanding of existing tests and measurements can maximum use 
be made of precise evaluative tools, and skill be improved in the use 
and synthesis of all means of evaluation. As the primary purpose, 
this text describes the status of measurement in the field of physical 
education. It further aims to assist the student preparing for 
teaching and the teacher of physical education to increase his com- 
petency in the use of tests and measurements, thus equipping him 
with essential tools for the evaluative process. 


The Importance and Use of Evaluative Procedures 


The foregoing definition assumed that evaluation is an important 
phase of the educational process. Is this a valid assumption? Why 
evaluate? The primary purpose of evaluation is the improvement 
of instruction. Only through a careful analysis of the degree to which 
desired goals are reached can the learning experiences provided be 
properly modified or fortified. Only by the accumulation of accurate 
information about the pupil can contributions be made to his indi- 
vidual needs, on the basis of his difficulties, his strengths and his 
weaknesses. Evaluation seems absolutely essential to the improve- 
ment of teaching techniques and conditions of learning. 

Either directly or indirectly the instructional program may be 


improved through the use of evaluative procedures for (1) appraisal 


The Need for and Use of Measurement 7 


of pupil progress, (2) diagnosis and guidance, (3) classification of 
pupils, (4) motivation, (5) instructional methodology, (6) appraisal 
of instructors, methods and materials, and (7) research. 

l. Appraisal of Pupil Progress. Since education involves 
development, the physical education teacher must know to what 
extent development takes place and how the pupil progresses from 
month to month and from year to year. The rate of progress made 
by pupils may be a very important consideration in the development 
of a technique of teaching, and will undoubtedly be of importance 
to the teacher in stressing various parts of the program. 

The Assignment of Marks or Grades. In keeping with accepted 
educational philosophy the final mark given to pupils in physical 
education should be based on the degree to which major objectives 
have been realized. It will include, then, (1) measures of skill, 
(2) measures of knowledge of rules and strategy, (3) evaluation of 
the part which physical activities play in the student's social develop- 
ment, and (4) evaluation of any other objective specific to the 
course of instruction. The mark in physical education which repre- 
sents the teacher’s subjective opinion of the pupil’s attendance, 
effort, interest, improvement and conduct is entirely inconsistent 
with present-day thought and should be discontinued. In its place 
should be substituted a composite score made up of the qualities 
mentioned above or “reports of progress in the accomplishment ot 
legitimate objectives measured in definite units."? The value ot 
giving marks in any school subject is a debatable point, but marking 
at present is a function of the school and as such must be properly 
done.9 The interested teacher of physical education will find 
Bookwalter’s discussion of value in clarifying his thinking with 
regard to marking. 

Promotion. 'The mark given the pupil in physical education is 
not necessarily the only or most important consideration in dis- 
tinguishing among abilities. In physical education discussion groups 
the thought has often been advanced that minimum standards ot 
achievement in activities should be set up for the various age or 
grade levels and that advancement from one level to another be 
determined by the pupil's ability to pass these minimum standards. 


5Neilson, N. P. and Cozens, Frederick W., Achievement Scales in Physical Educa- 
tion Activities for Boys and Girls in Elementary and Junior High School, p. 160. 
New York, A. S. Barnes and Company, 1954. Р 

®Bookwalter, Karl Webber, “Marking in Physical Education," Journal of Health 
and Physical Education, Vol. VII, No. 1 (January, 1956), 16-19, 61-62. 


8 The Status of Measurement in Physical Education 


Certainly the scales already in use will be found valuable for this 
purpose and will furnish objective evidence of achievement. 

2. Diagnosis and Guidance. The increasing efforts in education 
to understand and meet individual pupil needs have given rise to 
greater emphasis on guidance in school programs. Guidance en- 
deavors to increase the compatibility between the complexities ot 
individual differences, and the equally complex demands of varying 
curricular and life experiences. Adequate guidance necessitates an 
evaluation of capacities and abilities so the teaching process can be 
adapted to individual needs. 

The physical education needs of pupils are one of the most 
important considerations, and yet to discover these needs may often 
be the most difficult function of the teacher. In both school and 
college, where teachers deal almost entirely in mass education, 
diagnostic testing becomes a large undertaking, and takes time, 
thought and careful administrative planning. Teachers often believe 
that one test should furnish all the desired information about a 
pupil. Several physical education tests have been criticized on this 
particular basis. Physical activity needs cannot be diagnosed by 
the application of one simple measure. It must be realized that 
physical ability, like mental ability, is highly complicated. In using 
diagnostic tests judgment must be exercised with regard to the 
amount of faith placed in a test battery or in a single item of the test 
battery. The evaluative technique of utilizing cumulative records 
which include the results.of many different tests and evaluative 
procedures and follow pupils throughout their school careers is of 
particular value in diagnosis and guidance. 

Diagnostic testing, the discovery of strengths and weaknesses, 
must be followed by a program designed to meet the needs revealed; 
in other words, the program should arise from the needs of the pupil. 
This further implies that diagnostic testing and guidance is an 
educational device designed to make education most effective for all 
pupils, rather than a process applicable only to the exceptional child. 

Within the past few years a number of significant contributions 
have been made in the field of physical education along the lines of 
predicting present general status and probable development. 
Several studies have been directed toward the building of tests 
which will be valuable in predicting success in specialized athletic 
sports.” In the future more measuring devices of this nature may 

7See Chapter VIII. 


" 


The Need for and Use of Measurement 9 


be expected. Such tests will be useful in guiding boys and girls into 
competitive team games and in directing students into activities in 
which they show possibilities of skill for use in leisure time. From 
a knowledge gained by merely scratching the surface of measurement 
in physical education, the possibility of developing prognostic tests 
in some aspects of the program appears somewhat simpler than in 
many other educational fields. 

3. Classification. The usual procedure in organizing classes in 
physical education is on the basis of class or grade in the elementary 
school and class period available or open in secondary school, 
rather than on ability or skills. The programming in physical 
education too often becomes a matter of fitting periods of physical 
education around a schedule made out largely with reference to the 
needs of academic departments. It is not unusual for thirteen and 
fourteen year old boys to be grouped for instruction in physical 
education with those seventeen, eighteen and even nineteen. These 
boys cannot possibly be similar either in physical maturity or in 
level of skill performance. Can boys of such widely different physical 
make-up be given the same program, the same kind of activity? 
The logic of the situation demands that students be separated 
according to their general ability and skill and be given work 
arranged according to progressive learning experiences. 

Opportunities must be provided for the skilled as well as the 
unskilled to learn new activities. One educational problem of some 
significance is what can be done for the exceptional student. Class 
exercises are pointed toward the mediocre performer all too often. 
Education has no right to dull the curiosity of the skilful man by 
forcing him to react in the non-stimulating atmosphere of the 
ordinary. Conversely, placement requires an individual assignment 
for the restricted and below normal student. These students cannot 
Profit by the performance of the particularly skilful, and they need 
for the most part an attitude and an emotional adjustment not 
common to the normal student. 

Recognition of the recreational objectives of physical education 
depends on the pleasure obtained in pursuit of games or other 
activity. Is it any fun for the skilful player to compete against the 
“dub” or is it reasonable to expect the “dub” to be happy and to 
extend himself hopefully in competition with the expert? 

The only answer to these questions is sectioning according to 
ability. Comprehensive tests will provide the instructor with tools 


10 The Status of Measurement in Physical Education 


for the separation of his class into relatively homogeneous groups. 
Sectioning permits the use of a course of study and of teaching 
methods adapted to meet the needs of individuals. 

Considerable attention has been given in recent years to the 
classification of boys and girls according to the factors of age, height 
and weight for purposes of homogeneous grouping. These factors 
play a rather prominent part in determining physical performance. 8 

In addition to classifying students for instruction on the basis of 
skills, ability or predictive devices, other bases warrant attention 
for the total physical education program. The health examination 
is prerequisite to other classification methods. Health needs of 
selected students may supersede ability or skill as the method of 
classification for an activity. In the recreational program interest 
alone offers a valid means of grouping. The objectives of the course 
of instruction determine the factors by which students should be 
classified for the activity. Whatever the basis, however, proper 
classification will improve the instructional setting. 

4. Motivation. Achievement scores in physical education serve 
as a means to motivate students. They represent an objective meas- 
ure which can be used by the pupil in determining his present level 
of ability, and not a teacher's subjective opinion. Except in rare 
instances, pupils are interested in learning of their present status, 
its meaning, and what they can do to improve themselves. They 
want to know where they stand in relation to the group in which 
they hold membership; they are eager to improve their performances 
day by day and are interested in comparing their achievements in 
various activities. This is true not only of individuals but also of 
groups — squads vie with each other, classes strive to outdo other 
sections, and schools endeavor to improve their standing in relation 
to their competitors. The spirit of competition engendered by 
objective scores made under identical conditions offers a valuable 
incentive to improvement. 

Boys and girls are always interested in chart forms of performances 
in a variety of events. The Physical Ability Pentathlon Scale on 
Table I may be used quite readily by the teacher as a device for 
instilling interests in the pupil with reference to the improvement of 
his status. Average scores for each age-height-weight classification 
(A, B, C,) are shown as 5, and the other scores range between three 
standard deviations on each side of the mean. The student is thus 
8See Chapter VI. 


11 


re 


Ly EY i) UPS INS SO | Асе җы EE IE TE | V cl | ol | S IL 1) '€- £6 | 6:96 | 0I-6c | 


N 


£u 8 |g-76 | 4-01 19 —z 


cor HINT ge^ єп [zu f II || 0.-96 |. 6-66 о -ee | 
29 SO | SOC | UO vl E 6-6 |.017-= || S'IL | TIL | GOT || 01-82 | сс | £ -9£ © 


ИБР EU OSE ul 8`01 | sor Le | Gee | 9 b Ӯ 


НЕЕ ds mE: Sai 
84 £8 | Z8 Uc! | Sel | Ols#l. || РЕ | | S Е SOL | SOL | GOL || S FE 1-86 | Z CT | 9 
чә — 1—1 1————— 
3 88 | £6 | 26 I-PI| (tel | 2980 Se | 2-8 1 Вер ЕТО 66 43:6 GS ТУЕ ОТЧУ 9 
S 
I 86 FEOT | ZOL ||; pe-9T | ESOT | 8 —L 6. =P ОТЕ ШЕР 46 r6 6 0 -OF | TPT | 0 “67 2 
c 
3 
e SOT | SEC | AIL WS 99I ,65vI | PGE || 0 5S | L S1 9 p.60» О үш ОС? | €-Zv | £ —68 8 
= 
on 
‘= ЯТІ Sel 1 VEL Н ГДЕ 670. | S08], 22S F—9 | $79 ee 23: | os LSP | 6708.4 9-95 6 
№ 
S Scl | БЕТ | ЕТ || £ Or OC | LIMES || Жш 1 2.3 | Hg -9 1x9 US | 82 Sg ~8 | CSS 1-5 —99 ut 
[S арт з я | v olas ojajy ToL 22025 
у CSE I o — — . —— — ee 
= 29un]s1p 
з —yary 42220 | diuinf pvo4g buruuny dumf 014 битиипу uns psvh-¢ J md Pg 
S = 
= иот3221155°0) FYSIO AA -3Ҷҷ3ӘҢ-ә3үу pue 5уцәл 
s NOTIHIVINa3q ALIIIHV 1VOISAHq 
ча 


І ЯПЧУ. : 


12 The Status of Measurement in Physical Education 


able to tell at a glance where his performance stands in relation to 
the group. 

Testing, when a regular phase of the physical education program, 
serves as one way to increase the importance of the program in the 
minds of pupils. It tends to put the physical education program on 
a level with other school subjects in which frequent testing is 
accepted procedure. Further, the knowledge of the nature of the 
tests to be given may be expected to condition the learning directions 
of the student, as well as the extent of his efforts. Conversely, ot 
course, a poorly administered testing program may have negative 
effects on the learning of the student. 

Motwation ts a valid reason for testing in so far as teachers do not 
use the motivational values of testing as sole justification, and subse- 
quently fail to make use of lest results for other desirable purposes. 

5. Instructional Methodology. Testing and evaluation as a 
teaching device have a distinctive place in the physical education 
program. Many of the activities are of a "' self-testing" nature, such 
as track and field events, stunts, gymnastics and tumbling, archery 
and bowling. Some of these and additional activities, such as 
swimming and various forms of dancing, lend themselves to achieve- 
ment progressions as a means of instruction in the activity. The 
chief danger in using testing as a teaching device lies in the fact that 
in many instances it is used in lieu of instruction or guidance from 
the teacher. If the teacher controls this factor, testing and the use 
of achievement progressions and self-testing activities become a 
desirable instructional method. 

Any testing becomes an instructional device when it gives the 
student an insight into his accomplishments and increases his 
awareness of the elements considered important in a learning situa- 
tion. The actual practice of the skills during the test, or the exposure 
to the knowledges in the case of written tests are learning processes, 
provided the student gains insight into his level of accomplishment, 
and is motivated toward continued improvement. Other evaluative 
procedures such as biographical sketches, self-ratings, attitude 
analyses and check lists are valuable teaching devices. These serve 
to assist the student in understanding his own objectives in the 
learning situation and to increase his readiness for the desired 
learnings. 

6. Appraisal of Instructors, Methods and Materials. 


i " . B B B In anv 
educational situation it is often necessary or desirable to a 


ppraise 


TE T n 


The Need for and Use of Measurement 13 


the efficiency of the instructors, including the methods and materials 
used. The ultimate criterion for judging good teaching or teaching 
techniques is the effect on pupil growth. The need for utilizing many 
types of evaluative procedures to determine pupil growth has been 
previously indicated. Measurement will prove to be a valuable aid 
in such surveys, but considerable caution must be taken. If achieve- 
ment tests are used as a basis for the determination of pupil progress 
in different schools, good judgment must be exercised in equalizing 
all outside factors which may have a bearing on the progress of 
students. To attempt to make comparisons in teaching efficiency 
without equalizing outside factors would be obviously unfair. These 
factors may include one or more of the following: 


(1) The experience of the teachers concerned. 

(2) The ability and experience of the pupils. 

(3) The type of community and heritage of the pupils. 

(4) Working conditions of the physical education plant. 

(5) Content of the course of study in the various schools 


surveyed. 


7. Research. The use of evaluative procedures for research has 
two ramifications. First, the need exists for research to increase the 
objectivity and validity of all types of evaluative procedures. This 
has been indicated in the previous discussion regarding the pro- 
gressive expansion of the scope of measurement in relation to evalua- 
tion. Second, measurement per se is a basic tool of research. In 
every type of experimental work in physical education, measurement 
of some kind is necessary to carry the problem to a successful con- 
clusion. At every school level achievement tests are needed in 
measuring ability in various activities. Measurement in the field of 
dance is only beginning to show progress. Many possibilities present 
themselves for research in regard to the efficiency of various methods 
of instruction and class size. In matters pertaining to administration 


and curriculum content, controlled experiments demand the use of 
hniques, and measurement is vital to the 


hases of the psychologic, physiologic and 
education. It is important to recognize 
arch, research plays a prominent 


advanced statistical tec 
investigator in the many P. 
sociologic aspects of physica 
that, while all evaluation is not rese 


role in programs of evaluation. 


14 The Status of Measurement in Physical Education 
The Need for Measurement in Physical Education 


Significant progress in physical education during the past thirty 
years has been due in large part to the growing scientific interest of 
teachers toward problems of the profession. The concept that further 
progress can be made only by means of increased knowledge of 
scientific procedures has little by little pervaded the entire pro- 
fessional atmosphere. Measurement must unquestionably be 
grouped in the category of scientific procedures essential to continued 
professional progress. The history of the scientific movement reveals 
the directness of the relationship between the advancement of the 
status of a given science and the degree to which measurement has 
developed within that field. A simple example of this truth is the 
effect of the discovery of the telescope on the advancement of the 
science of astronomy. Similarly, the student of education recognizes 
the far reaching eftects of the development of measures of intelli- 
gence on educational methodology and programs. 

As will be discovered later in this volume, measurement in physical 
education is not a new idea. The historical survey shows that the 
thoughtful teacher has been endeavoring to rate pupils and measure 
their progress for a long time. Some of this evaluation has been 
objective, some has been purely observational. Until about 1925 
practically all of it was more or less unscientific. Increased knowl- 
edge and use of adequate statistical techniques have made possible 
the construction of an ever increasing number of scientific measuring 
devices. 

Despite considerable progress, measurement in physical educa- 
tion, as in all education, is still in the pioneer stage. It is true that 
many imperfections exist. This factor unfortunately deters many 
teachers from attempting adequate measurement programs. As 
Ross!? so aptly pleads the case, the limitations of educational 
measurement devices should not deter their use, but rather should 
stimulate the user to additional effort. Limitations of measurement 
add to the difficulties, but do not detract from the importance. 
Advance in measurement in physical education will come only as 
the individual teacher in the field increases his knowledge of availa- 
ble tests and measurements, his sensitivity to their importance and 
use, his skill in applying measurement to his physical education 


Ross, C. C., Measurement in Today’ s Schools, р. 15. New York, Prentice-Hall, 
Inc., 1947. 


10Tbid. 


| 
p 


The Need for and Use of Measurement 15 


program, and his ability to interpret results in light of the limitations 
of the measures used. : 

'The theses of this text are, first, that any program of physical 
education to be termed adequate must attempt to appraise the 
worth of its outcomes; and, second, that the adequacy of the process 
of evaluation will be enhanced by the extent to which available tests 
and measurements are understood and used effectively. 


Selected References 


Bann, A. S., Burton, WiLLiAM Н. and BRuECKNER, Leo J.: Supervision. (Second 
Edition). New York, D. Appleton-Century Company, Inc., 1947. Pp. 879. 
Chapter V, “Determining the Objectives of Education," will lead to the 
understanding necessary regarding objectives, which is essential to the evalua- 
tive process. Other chapters on the appraisal of the educational product, and 
evaluating the means and methods and outcomes of supervision are of interest 
to the student of measurement. 
Boorwarrkn, KARL WEBBER: “Marking in Physical Education,” Jr. Health and 
Phys. Educ., Vol. VII, No. 1 (January, 1936), 16-19, 61-62. 1 
Presents an interesting and rather thorough discussion of marking—pur- 
poses, criteria, past mistakes, and suggested scheme for assignment of marks. 
Contains excellent references to the entire problem. s ue 
Cowett, Cartes C.: “Evaluation versus Measurement in Physical Education," 
Jr. of Health and Phys. Educ., Vol. XU, No. 9 (November, 1941), 499-501. 
› ce of evaluation in physical education, lists five steps 
nderstandings the teacher must have to 
and cons of marking in physical 


. Indicates the importan 
in evaluation procedures, stresses u 
utilize evaluation effectively, and discusses pros 
education. 

Green, Н. A., JORGENSEN, А. N. 
Evaluation {п the Secondary School. New 


Pp. 670. В < Р 
je m I defines the meaning of evaluation and the relationships of tests 


and measurements thereto. Other chapters of interest to students of physical 
» the construction of informal objective tests, 


and measurement in health and 


and GERBERICH, J. R.: Measurement and 
York, Longmans, Green and Co., 1943. 


education measurement deal with t x 
use of measurement devices in pupil guidance, 


physical education. 


Kozman, HILDA Сиот 
tn Physical Education. 


ALIND and Jackson, CHESTER O.: Methode 

a S W. B. Saunders Company, 1947. Pp. 552. 
The entire book illustrates the complete synthesis of the process of evaluation 
in all aspects of teaching, althóugh measurement as defined in this text is not 
Considered. See particularly the Chapters on Guidance „Techniques and 
Tools,” “You, the Teacher: Self Appraisal,” and “Building the Physical 


Education Program.” 
NATIONAL SOCIETY FOR TH 
Part I— The Measurement of Une 

Chicago Press, 1946. Pp. 558. 
The introductory chapters de 
opposed to rote learnings 1n educa 


E STUDY oF EDUCATION: The Forty-Fifth Yearbook, 
derstanding. Chicago, The University ot 


al with the importance of understandings as 


tion, and the need for measurement in this 


16 The Status of Measurement in Physical Education 


area. Chapters XI and XII outline the understandings desired in health and 
physical education, and describe available measurement tools. 

Remmers, Н. Н. апа Gace, N. L.: Educational Measurement and Evaluation. 
New York, Harper and Brothers, 1943. Pp. 580. 

Chapter I, “Why Evaluate," discusses the purposes of evaluation, and 
Chapter II, “Achievement of Instructional Objectives,” indicates why teachers 
must formulate objectives, and how to do so. 

Ross, C. C.: Measurement in Today's Schools. (Second Edition). New York, 
Prentice-Hall, Inc., 1947. Pp. 551. 

Contains valuable chapters on the historical development of measurement 
in education, the importance of measurement, and measurement in motivation, 
practice, diagnosis, school marks, classification and promotion, guidance, 
evaluation and public relations. 

Troyer, Maurice E. and Pace, C. ROBERT. Ryalualion in Teacher Education. 
Washington, D. C., Commission on Teacher Education. American Council 
on Education, 1944. Pp, 368. 

Chapter I presents a clear analysis of the characteristics of evaluation, and 
Chapter X, discusses evaluation in the educative process. 

Wesraway, F. W.: Scientific Method: Its Philosophical Basis and Its Modes of 
Application. (Fifth Edition). New York, Hillman-Curl, Inc., 1937. Pp. 588. 

Supports the thesis that measurement is essential to the growth of a science. 
Also discusses the expected margins of errors in measurement. 

WILLIAMS, JESSE FEIRING and BROWNELL, CLIFFORD Lee: The Administration of 
Health and Physical Education. (Third Edition). Philadelphia, W. B. Saunders 
Company, 1946. Pp. 483. 

Chapter XX, “Measurement in Health and Physical Education," includes 
the following topics of particular importance: the role of measurement, some 
basic theses, the uses of measurement, common errors in measurement, and the 
statistical versus the clinical approach. 

Woop, Ben D. and HAEFNER, RALPH: Measuring and Guiding Individual Growth. 
New York, Silver Burdett Company, 1948. Pp. 538. 

Outlines the basic problems in educational measurement and evaluation, 
uniquely presenting the entire material as dialogue and discussion among 
administrators, laymen, teachers and students. 


ee АА E 


CHAPTER II 


The Development of Measurement 


in Physical Education: 
Brief Historical Sketch 


(4) Early Development of Physical Education 
Measurement е 
The history of measurement work in physical education may be 
divided roughly into five phases or stages: 


Steps in progress Approximate dates 


(1) Anthropometric. ieee ttn 1860-1880 
(2) Бане... oce modii di ОНА Wes e d 1880-1915 
(3) Cardiac functionale. c/s se eyelet КММ. e 1900-1925 
(4) Athletic ability... iens 1904-date 

1920-date 


(5) Single test or index figure....-- ss 


"These periods overlap greatly and do nof represent clear-cut suc- 
Cessive steps in the use of testing measures. On the contrary, the 
Outline suggests different ways by which the problem has been 
attacked and the dates given will help to fix the limits within which 


a particular type of test was most used by the leaders in physical 


education. 

Development of Anthropometry. Though the study of 
anthropometry and the significance of the relative proportions ot 
the human body have undergone some recent investigation, the early 
beginnings reach back to the remote civilization of India, where a 
treatise called Silpi Sastri investigated the outline of the body by 
dividing it into 480 parts.1 The ancient Egyptians also used a rough 
sort of anthropometry during the period from the thirty-fitth to the 


1Hitchcock, Edward, Report of t 
ment”. Read to the American 
Education, 1886. 


he Committee on “Method of Physical Measure- 
Association for the Advancement of Physical 


17 


18 The Status of Measurement in Physical Education 


twenty-second century, B.C.? In an attempt to find some one ana- 
tomic portion of the body that would be a common measure of all 
the other structures, the body was divided into nineteen equal 
segments, each of which was the length of the High Priest's middle 
finger. The art of these ancients seems to picture the ideal as the 
heavy type of stature but later this was replaced by a type lighter 
and less robust. 

The Argive sculptors of ancient Greece, particularly Phidias and 
Polycletus, searched diligently for a unit of measurement to be used 
in determining the correct proportions of the perfect, godlike man. 
Phidias is said to have used as many as twenty models in trying to 
find the right proportions of the human figure.? Polycletus, after 
careful study, fashioned a canon or model called the Doryphoros or 
Spear Thrower which, by general consent, represented absolute 
perfection in human proportion.* He pictured the perfect man as 
a fighter and an athlete, ""broad-shouldered, thick-set and square- 
chested." This ideal lasted for about a hundred years, but, says Mc- 
Kenzie, 5 “as the arts of civilization became more gentle, the desire 
for a more slender and elegant type became greater. It was grace 
rather than strength which began to appeal to the Greeks and the 
form changed from strength to elegance and from power to skill.” 
It is well to note here that that same change has taken place in our 
modern conception of the perfect human during the last forty- 
five years. 

The Roman sculptors, though they followed the Greek canons 
to a certain extent, "developed original lines of thought in connec- 
tion with human proportions. We do not know, however, that they 
derived these ideals from many measurements of proportions, but 
have reason to believe that they were the result of the study ot 
graceful forms and of ripened judgment in regard to physical beauty. 
The table of proportions given by Vitruvius does not give evidence 


of actual measurements taken and compiled, but he probably drew 
on older canons.” 6 


? McKenzie, R. Tait, “The Quest for Eldorado," Am. Phys. Ed. Rev., XVII 
(May, 1913), 295. 


3Baldwin, Bird T., Physical Growth and School Progress. Washington, D. C., 
Bureau’ of Education Bulletin No. 10, 1914. 


4Seaver, Jay W., Anthropometry and Physical Examination. Meriden, Conn., 
Curtis-Way Company, 1909. Б 


5McKenzie, R. Tait, op. cit. 
Seaver, Jay W., op. cit, p. 9. 


Development of Measurement in Physical Education 19 


A rough sort of anthropometry continued to be used by artists and 
sculptors down through the centuries. Baldwin? tells us that “as 
early as 1770 Sir Joshua Reynolds called attention in an address 
delivered before the Royal Academy of Fine Arts, to the differences 
in the measurement of the human form from childhood to adult 
life." An Italian sculptor, Alberti, proposed a module of 1 foot in 
height, which was divided into ten degrees and minutes, as a 
standard for the proportions of the human body.3 Baron Quetelet, 
however, is the father of anthropometry and is reputed to have 
coined the word in 1835.9 In his work“ Man and the Development of 
His Faculties” or “An Essay upon Social Physics,” he deals in the 
first two of the four volumes with the physical qualities of man, the 
determination of the average man in general, makes an examination 
of all that relates to the life of man, his birth, death, strength, height, 
agility, etc. ! ? 

"In 1854 a German, Carus, proposed an anatomical basis for 
determining human bodily proportions, assuming the hand length 
for the unit, and the adult vertebral column of 24 free vertebrae, to 
be the key to these proportions." ! ! 

The first important investigation of physical measurements of 
adolescent boys was made in 1854 by Zeissing in a study of Belgian 
children.!? Later, in 1860, Cromwell contributed a study of the 
&rowth of Manchester school children for the ages from eight to 
eighteen. In this study he discovered the general law “which has 
been verified by every authority since that date with the exception 
of Quetelet for normal children and Goddard for feeble-minded 
children, /. e., girls are taller and heavier than boys from the approxi- 
mate ages of eleven to fourteen. The boys then become taller and 
heavier and continue their growth longer.” 18 

The work of Hitchcock at Amherst starting in 1861 is especially 
important. His extremely careful measurements at first only in- 
cluded the items of age, height, weight, and the girths of chest, arm 
and forearm and the strength of the upper arm as measured by the 
Pull-up, but later he adopted the list of measurements as recom- 


7Baldwin, Bird T., op. cit, р. 145. 
Hitchcock, Edward, op. cit. 
; Baldwin, Bird T., op. cit. К 
Smithsonian Report, 1874, pp. 169-183. 
{ "Hitchcock, Edward, op. cit. 
?Baldwin, Bird T., op. cit. 
"bid, p. 143. 


20 The Status of Measurement in Physical Education 


mended by the American Association for the Advancement ot 
Physical Education, over 50 in number. : 


In 1880 Dr. D. A. Sargent of Harvard University began the 
systematic measurement of students, and the compilation of the 
data that he had gathered was published in 1893 in the form of per- 
centile tables for the various years of college life for both men and 
women. The mean or 50 per cent record was graphically represented 
in plastic figures of both man and woman, and these figures were 
exhibited at the World's Fair at Chicago in that year, where they 
created a wider interest in the cultivation of the physical growth ot 


T 
students and gave a healthier trend to the gymnastic ae that 
has followed. * 


A number of others have made notable contributions and among 
these Baldwin!? mentions Steet who developed the idea of the 
weight-height index. The interest in physical measurements re- 
ceived a world-wide recognition and we find many famous names 
attached to these studies, such as Galton and Roberts in England, 
Hertel in Denmark, Key in Sweden, Geissler in Germany, and 
Bowditch, Porter and Goddard in the United States. 


In 1902 a study of the growth of the human body from the fifth 
birthday to the twenty-first was made Ьу, Dr. W. W. Hastings of 
Springfield. The result of this work was the publication of a Manual . 
Jor Physical Measurements for both boys ата girls. This book gives 
in percentile form the records of children in the public schools of 
Omaha, Nebraska, and adolescents from that state and Connecticut, 
in groups for each year of age and for convenient gradations of 
height. This material has been arranged on cards for individual 
use in recording personal measurements graphically and has thus 
found a wide field of use and has given an excellent standard by 


which even the untrained can detect at a glance the deviation from 
the standard for that type.16 


As far as physical education is concerned, the greater part of the 
early anthropometric work placed emphasis on symmetry and size. 
Sargent at Harvard and Hitchcock at Amherst beginning about 
1880, used their extended series of measurements as a guide in 
determining norms for each age and worked out charts for each 
individual showing how that individual compared with the norm, 
not only on the basis of all measurements taken but also as far as 
14Seaver, Jay W., op. cit., p. 14. 
15Baldwin, Bird T., op. cit. 
19Seaver, Jay W., ор. cit, рр. 14-15. 


Development of Measurement in Physical Education 21 


strength was concerned. Sargent’s chart, shown on page 41, will 
illustrate. 

Development of Strength Tests. The shift of emphasis (about 
1880) from symmetry and size to the measurement of the actual 
work of an individual was no doubt hastened by the invention of the 
spirometer and the dynamometer. Sargent's strength test idea was 
first worked out in 1873 at Yale while he was still a medical student 
and later developed in more detailed fashion at Harvard, beginning 
about 1880. His original plan in measuring strength “was to test the 
efficiency of men in handling their weight by their arms as a prelim- 
inary qualification for proficient work in heavy gymnastics." 17 He 
concluded that it was capacity and not size alone which would be ot 
the greatest practical value and that, after all, tape measurements 
really did not tell us much. Says Sargent, “There is in most men an 
unknown equation which makes for power and efficiency which has 
never been determined and which can only be measured by an actual 
test." His conclusion, that body size and measurement of muscles 
alone did not furnish sufficient data upon which to base a judgment 
er and working capacity, was reached after keen 
Observation for a number of years, and remained the dominant idea 
in physical education for upward of twenty years; in fact, a number 
of leading institutions did not discard his strength test until about 

915. 

The work of Kellogg in the late eighties and early nineties 
emphasized the importance of exercise as a therapeutic measure 
and led to the invention of the Universal Dynamometer with which 
he was able to test the strength of many groups of muscles. 

“The earlier tests that were applied in anthropometrical inves- 
tigations related to the size or mass of the various parts of the 
body."18 It was supposed that exercise could be prescribed on the 
basis of muscle size. It was soon demonstrated, however, as is 
Pointed out by Seaver that other data were needed as the “large 
Man is not always the strong man, and with equal truth it may be 
said that the strong man is not always the man of high endurance. 
This last statement was the very idea which caused the decline of the 
Strength test. Besides not always being the man of high endurance, 
the strong man very often cannot use his strength to the best ad- 
ficiency Tests,” Am. Phys- 


of a man’s pow 


"Sargent, Dudley A., “Twenty Years Progress in E 
Educ. Rev, XVIII (October, 1915), 452. 


18 4 
Seaver, Jay W., op. cit. 


Mnnaeeinner т, eee 


22 The Status of Measurement in Physical Education 


vantage. Practical experience has demonstrated this and a call for 
a new type of test resulted. 

Most of us will probably agree that what we are looking for now 
in physical education is not inherent strength, size or symmetry, but 
ability to use the muscle power for performance in the various play 
elements and for skill in handling the body in the daily routine ot 
living. Anthropometric measurements and the strength test, how- 
ever, include a number of points of special value in testing work ot 
today. Age, height and weight are all important and no matter 
what the variety of test, these facts must be collected. Without 
critical analysis, data of any sort are practically valueless. Many 
files of strength records still repose in some musty box, never to be 
used, or, if active, need a thorough scientific examination. 

The idea of total capacity is present in the strength test. How 
does this individual rate as regards the average or as against this 
other individual? This determination is of value not only for the use 
of the instructor in prescribing for the individual but also to give 
the student a better idea of where he stands in relation to his fellows. 
"Therefore, “rating” is necessarily an important element in physical 
education. In the present state of our science, any definitely deter- 
mined physical ability rating, whether the method used is absolutely 
correct or not, is to be regarded as a distinct contribution. Then, too, 
there is present in the strength test the element of competition 
against one's own record and that of his associates. Sargent realized 
that only a small proportion of the student body was served in inter- 
collegiate athletics and he was especially anxious to arrange a test 
in which all could compete, which would hold the interest and make 
men strive to better themselves, which would keep them in trim at 
all times. It is this idea which has made Reilly's scheme for rational 
athletics so successful.!9 This is the crux of the whole situation; 
that whatever is done shall hold the interest of the person tested, and 
stimulate him to make himself physically efficient. 

Strength tests, dormant and unused for a period of fifteen to 
twenty years, were revived by Rogers?? who has scientifically shown 
that such tests are valid as measures of general athletic ability and 


19Reilly, Frederick J., New Rational Athletics for Boys and Girls. Boston, D. C. 
Heath and Company, 1917. 

20Rogers, Frederick Rand, Physical Capacity Tests in the Administration of 
Physical Education. New York, Teachers College, Columbia University, 
Contributions to Education No. 173, 1925. 


Development of Measurement in Physical Education 23 


can be used readily to classify high school boys for purposes of 
competition. 

. Development of Cardiac Functional Tests. With the inven- 
tion of the ergograph in 1884 by Mosso, the Italian physiologist, 
the measurement of muscular strength took on a different aspect 
than that which occupied the attention of Sargent. Mosso pointed 
out the essential relationship that the ability of a muscle to perform 
was related to the efficiency of the circulatory system, that any 
interference with the nutritive functions of the body decreased the 
power to do work, that fatigue of one set of muscles affected others 


as well. He was a pioneer in establishing the relationship between 


“physical condition" and “muscular activity." 

Thus the attention of physical educators was turned away from 
the developmental and strength testing, and experimentation began 
with attempts to find a more satisfactory indication of physical 
condition. Soon after 1890 rapid advances were made in the physi- 
ology of the heart and circulation and special emphasis came to be 
put on the hygienic rather than the muscle building aspect. Lom- 
bard's?! studies in the early nineties indicate a trend of the times. 
Says Burton-Opitz, ?? “А number of very serviceable methods were 
devised at about this time for ascertaining the pressure under which 
the blood is made to circulate. It is true, however, that these pro- 
cedures were not applied in a practical manner until about the year 
1900." 

Crampton, ?? in 1905, publi 
along this line and set up ага 
idea regarding the general con 


shed the results of his experimentation 
ting scheme to obtain an approximate 
dition of a person by noting changes 


in the cardiac rate and arterial pressure on assuming the erect 
Position. A large number of tests made on high school pupils ot 
New York City led to the conclusion that the change from the hori- 
zontal to the erect position increases the heart rate from 0 to 44 
beats per minute and causes variations in the systolic blood pressure 


ranging between minus 10 and plus 10 mm. Hg. + 
Very little was done in the way of cardiac functional tests from 


1905 to 1914 with the exception of the work of McCurdy in 1910 on 

21Lombard, Warren Р., "Some of the Influences which Affect the Power of 
Voluntary c ns" Jr. of Physiology, XIII (1892), 1-58. 

oluntary Muscular Contractions P imn Am. Phys. Educ. Rev., XXVII 


??*Burton.Opitz, R., “Tests of Physica 
Medical News, LXXXVIII 


2a April, 1922), 153-159. 
Crampton, C. Ward, 
(September, 1905), 529. 


“A Test of Condition," 


24 The Status of Measurement in Physical Education 


* Adolescent Changes in Heart Rate and Blood Pressure” ?* out ot 
which grew a simple test of condition. In 1914, however, three 
physical efficiency tests were reported: those of Meylan, Foster and 
Barach. Meylan’s test included the elements of general condition, 
rhythm and character of pulse rate, blood pressure, and a test of the 
heart’s reaction to exercise. 

The general idea in the Foster test was the same as that in the 
Meylan test but the possibilities of classification of individuals 
according to their fitness was more elaborately set forth. 

The Barach test was designed to yield an index of efficiency based 
upon several determinations of the systolic and diastolic blood 
pressure and the pulse rate executed during a period of one minute. 
Both pressures are multiplied by the heart rate and the products 
added yielding an arbitrary value in which the last two digits are 
cut off to reduce the index number. 

In 1916 appeared the Barringer test which attempted to show 
that the individuals who were physically deficient would display a 
“delayed rise” in blood pressure after the completion of an exercise. 

After much experimentation with Crampton’s and Foster’s tests, 
Schneider?9 found them unsatisfactory “because of the fact that 
physical deterioration may be manifest in various ways in the cardio- 
vascular mechanism." He then proceeded to set up a test of his own 
which recognized more factors than had any test previously devised. 
This received wide use in aviation during World War I to determine 
fatigue and physical condition for flying. In this test are contained 
the relationship of pulse rate and blood pressure in the reclining 


position to that of standing and also the ability to recover normal : 


standing records after a measured amount of exercise. Deficiency 
in physical fitness manifests itself in lack of cardiovascular compen- 
sation and thus this test served a very practical purpose. 

During World War I and immediately following, British investi- 
gators, led by Campbell,?9 using this same general principle ot 
cardiovascular adjustment to general physical fitness, devised a test 
involving breath holding and recovery after exercise. Later these 
were shortened to a pulse rate recovery test known as “ Campbell's 


24McCurdy, J. H., "Adolescent Changes in Heart Rate and Blood Pressure," 
Am. Phys. Educ. Rev., XV (June, 1910), 421. 


25Schneider, E. C., “А Cardiovascular Rating as a Measure of Physical Fatigue 


and Efficiency," Jr. dm. Med. Assoc., LXXIV (May 29, 1920), 1507. 
26 Campbell, J. M. H., “Weight, Vital Capacity, Pulse Rate Before and After 
Exercise and Physical Fitness in Health," Guy's Hospital Reports, Vol. 75, p. 263. 


_— 


Development of Measurement in Physical Education 25 


Pulse Ratio Test" — a test which can be easily and quickly given. 

This phase of testing has made a definite contribution in establish- 
ing the relationship existing between physiological systems and the 
fact that the body reacts as a whole. Mosso’s propositions of 1884 
have been given a practical application. It is necessary in order for 
muscular activities to be at their best that nutrition, circulation and 
the nervous system also be in good condition; or to put in it terms of 
physical education, ability to perform may be considerably modified 
by physical condition. 

Development of Physical Ability Tests. A number of factors 
led to the decline of the strength test and the development of an 


interest in the ability to handle the body in running, jumping, 


climbing, throwing and the like. The strength test was criticized 
on the ground that it was not a good test of endurance, heart and 
lung development. Further, the idea got abroad that men became 
muscle bound by strength test practices, that these practices de- 
veloped the *draft-horse" type of man, the strong man type with 
short limbs, large chest, broad back and shoulders, and great girth 
measurements of all parts. The athletes particularly felt that it 
ruined them for intercollegiate competition of any sort and refused 
to have anything to do with it. A cry was set up that what was 
wanted was a test in which strength should be a minor factor and 
speed and endurance of first importance. This unrest soon after 
1900 culminated in the devising of tests which measured the ele- 
ments of speed and endurance. Here again, Sargent did pioneer 
work and in 1901 devised a test in the nature of six simple exercises 


which were continued for a period of thirty minutes wi thout rest and 
in which the survivors were considered to be efficient physically. 
Prior to this time, in 1894, we find the beginning of the physical 
ability and classification tests in the Normal School of Gymnastics 
at Milwaukee where a student’s ability in nine events was meas- 
ured. These tests included jumping climbing, shot-putting, lifting, 
etc. About the same time,a class pentathlon was proposed by the 
Lake Erie District of the Turnerbund and given first by Hs Gym- 
nastic Societies of Cleveland, Ohio, on Labor RUE e RAT 
We must give credit, however, to Meylan of A sve P : 
development of a comprehensive | test certe e ET | 9 
running, jumping, vaulting, climbing and the like. His’ work a 


Y ber, 1894), 7-10. 
27"Class Pentathlon,” Mind and Body, Vol. I, No. 8 (October ) 


26 The Status of Measurement in Physical Education 


Columbia began in 1904, was apparently well considered and spread 
rapidly, until in 1915 and 1916, testing individuals according to the 
elements involved in play was almost universal. 

l. Testing in Public and Private Schools. Turning now to the 
development of testing in public and private schools, we find, in 

- 1904, a team competition going on at Phillips Andover Academy 
under the direction of Pierson S. Page.?3 Though this is not a 
physical test in the strict sense of the word, the idea ig present and 
it shows a distinct advance toward a scheme of scoring and a desire 
to record performances of boys who are not competing on athletic 
teams. i 

Beginning about 1908, physical ability tests were given in the 
Cleveland Public Schools?’ and in the same year the New York 
Public School Athletic League? organized competition in three 
events, requiring 80 per cent attendance. 

“Efficiency tests" were conducted in the High Schools of Cincin- 
nati beginning in 1910. A button test for all-around efficiency in 
five events was organized. The boys were divided into two classes, 
Juniors and Seniors, with different standards for each. An individual 
must score 40 points for a button, at least 7 in each event with a 
maximum of 10. 

Certain playground efficiency tests were conducted in 1911 and 
"average records for average boys and girls" were secured by John 
H. Chase. 

About this time also a scoring system for athletic events was set 
up in the St. Louis Public Schools.?! Boys and girls were divided 
into four weight classes with three events in each class. 

The next important step in physical ability tests for boys and 
girls was taken by the Playground and Recreation Association of 
America in the publication of the Athletic Badge Test for boys in 
April 1913. Later came the Athletic Badge Test for girls. The 
purpose of these tests was to stimulate in every boy and girl the 
desire to reach a certain minimum physical standard and especially 
to benefit children in the rural districts. 


?8Page, Pierson S., “Кесгеабуе Athletics, Gymnastics and Games," Am. Phys. 
Educ. Rev., IX (September, 1904), 206. 


29“Tests in the Cleveland High Schools,” т. Phys. Educ. Rev., XIII. (April, 
1908), 241 and XIII (June, 1908), 574. 


30"New York Public School Athletic League," dm. Phys. Educ. Rev., XIII 
po m. Phys. Educ. Rev., 


31"St. Louis Public Schools Scoring System for Athletic Events," Mind and 
Body, XVIII (January, 1912), 407. 


Development of Measurement in Physical Education 27 


From 1913 onward a great wave of testing in physical education 
gradually swept the country. In 1914, J. H. Richards of Newark, 
N. J., worked out his Physical Education Efficiency Tests for Grade 
Schools. His point scoring system included seven events with two 
classifications and standards for attainment according to age, height 
and grade. The Detroit tests came into being soon after 1914 and - 
the age aims set up by Stecher at Philadelphia are valuable histor- 
ically as are also those outlined in the New York State Physical 
Ability Test. The Decathlon Test in California under the admin- 
istration of Hetherington and later Stolz has done much to stimulate 
the testing of elementary school boys and girls. One of the most 
unique contributions in the field is that given us by Reilly in his 
“Rational Athletics for Boys and Girls.” Nash, formerly of Oak- 
land, contributed excellent data in the study of performances in 
various events. The reports of the national committees on standard 
physical efficiency tests gave exceedingly valuable data on the de- 
velopment of tests of physical ability. McCurdy of Springfield 
contributed a great deal of work along this line as did also Maroney 
for the public scools of Atlantic City, N. J. Later contributions of 
data came from R. K. Atkinson of the Russell Sage Foundation. 

The demand arose for a battery of motor tests which were scien- 
tifically worked out and which could be readily used with large 
&roups. In 1927, Brace brought forward a scale of motor ability 
tests which has proved exceedingly valuable in the classification 
of pupils and in furnishing a basis upon which to evaluate achieve- 
ment.” 32 These tests are of the nature of stunts calling for a large 
variety of general coordinations such as agility, balance, control, 
flexibility and strength. 

Bliss33 made a valuable contribution to the measurement ваша 
іп making an analysis and scientific study of Ld ciim in рі ra 
education activities. More work of this type will e reporte in 
Chapter V, * Athletic Achievement Tests and Scoring NE М 

2. Testing in Colleges and. Universities. aue a er рагі 
of the period from 1904 1071915 the interest in the ie ape. 
these tests in colleges and universities became so intense as to 
32Brace, Dava K, Measuring Motor Ability, preface, p. xvi. New York, A. S. 


arnes and Company, 1927. А ased оп Аве, Sex and Individual 
33Bli "A f Progression Base dig 
авв E Mel ct kill,” Ат. P аа, асаан 
1927), 11-21 and XXXII (February, 1927), 85-57. 


28 The Status of Measurement in Physical Education 


into being a national athletic fraternity, Sigma Delta Psi, founded 
at Indiana University in 1912. The aim of this organization was to 
stimulate interest in physical condition and all-around development 
and chapters in many of our leading colleges and universities were 
organized, 

A number of departments of physical education for men in colleges 
and universities set up excellent tests. Chronologically, the first of 
these which should be mentioned is the one at the University of 
California under the direction of Professor Frank Kleeberger.?* 
This was inaugurated in the fall of 1915 and had the distinctive 
feature of classification based on knowledge and skill displayed in 
agility, defense and swimming. 

The Physical Ability Test?? at the University of Oregon was 
developed in 1921 under the direction of Professor Harry A. Scott. 
The test was designed to find out whether a man possessed those 
abilities which the department was trying to develop. If he had, he 
was privileged to specialize in any line of activity which he desired. 
If he had not, he was assigned to such work as would enable him to 
develop physical skills. 

The scheme of testing at the University of California at Los 
Angeles? included four sets of tests for the required four semesters 
of physical education. These batteries of tests were used primarily 
for semester grading. 

Schuettner, #7 while at the University of Illinois in the period 
directly after World War I, worked out a very elaborate scheme ot 
testing to stimulate interest in physical education among the men 
of the University. Though the plan is not followed at present, it 
represents a phase of the development of testing and should be given 
mention for that reason. The tests were divided into five sections 

‘with twenty-three events in all, arranged on a “point system" so 
tbat the average student who was willing to exert himself could 
34Kleeberger, Frank L., “Physical Efficiency Tests as a Practical Means ot 


Popularizing Physical Education at the University of California," Am. Phys. 
ше. Rev., XXII (December, 1917), 551-554 and XXIII (January, 1918), 

“The Pentathlon, A Physical Ability Test,” Am. Phys. Educ. Rev, XXIX 
(January, 1924), 30-32 and XXIX (February, 1924), 88-94. 


ae of Tests,” dm. Phys. Educ. Rev, XXVII (November, 1922), 


37Schuettner, A. J., “The Universi 5 Ж ; s 
PISSEHI Agen е muy of Illinois Plan to Stimulate Interest in 


(April 14, 1919). Universily of Illinois Bulletin, Vol. 16, No. 53, 


Development of Measurement in Physical Education 29 


qualify for the emblem of the lowest division and with further effort 
raise his total score and secure the insignia of each of the inter- 
mediate divisions and finally the highest award. 

The Ohio State plan reported by Nichols in 1920 did not present 
anything new but it is interesting to note that in order to elect a 
desired activity a man must pass the efficiency test with a grade of A. 

Metcalt33 proposed a minimum and a maximum test for college 
men and based the test on “natural movements which are the foun- 
dation of practically all forms of work and play"—running, climbing, 
throwing, lifting and swimming. 

McCurdy,39 as chairman of a National Committee on Motor 
Ability Tests, pointed out the possibilities of extending testing to 
the various games, particularly the major sports, and set up tests 
of strength, skill, speed, endurance, and agility in football, soccer, 
field hockey, basketball and tennis. 

Still more recent was Cozens' study 1? which sets forth a battery 
of tests that can be used to classify college men according to their 
big-muscle efficiency and which also indicates their special weakness. 
Like the studies of Rogers and Brace, this has been scientifically 
Constructed and set up according to recognized statistical procedures. 
Itis used primarily to measure freshmen and classify them according 
to thei ; 

3. p us Women and Girls. Prior to 1920 little was done in 
Working out tests for girls and women. Wayman s report gives us 
the results of the work of the committee appointed by the College 

omen Directors of Physical Education in the EA 1925. Aat 
fest brings out not only the motor ability or physical Eid 7 : e 
individual by means of certain events of skill in han Po i e a zA 

ut also stresses physical fitness or soundness” as disclosed by 


Motor test. J bi 
Classifying the aspects of motor ability under ms Beady; Um 
58 Metcalf T. N., “Standards and Tests in Physical Education, Am. Phys. Educ. 


ты qr TEL il 

О 1х (May, 1924 J IS Measurement of Caril Athletic Ability in College 

agen pa University of Отоо» tor "Ability. Arch. of Psych., No. 62, 
(April, 1925) 


3 


Quotient,” dm. Phys. Educ. Rev., 


30 The Status of Measurement in Physical Education 


in 1923, devised a team of eight tests correlating highly with е 
reliable criterion. ‘These tests were worked out with Barnar 
College women, embody a scientific procedure, and are a real con- 
tribution to the testing program in physical education. 

Besides objective and subjective measurements of motor боо, 
the scoring scheme suggested by Collins and Howe +? at W Кее 
awarded points in lump sum for the physically and medically fit, an 
employed certain physiometric tests and somatometric indices in 
getting at the problem of physical fitness and efficiency. . 

Florence Alden, working at the University of Oregon, in 1922 set 
up a “‘self-testing” program administered by means of the squad 
system. Although passing the medical examination Was! а pres 
requisite to the test, the “proficiency” rating of each individual was 
on the basis of skill alone. 2. 

Development of Indices for Measuring Physical Efficiency. 
АП of these physical or athletic tests require a more or less elaborate 
recording system and it is difficult to obtain quickly a summary of 
one's standing. To obviate this difficulty came the index which 
attempts to show in a single figure how near to some ideal standard 
each student has arrived. Here are met such extremes of simplicity 
and of complexity as shown by the “vital index" on one hand and 
indices requiring logarithms and statistical tables on the other, in 
order that one's physical make-up in a single figure may be expressed. 

Very little was done up to 1920 in the way of devising indices for 
measuring physical efficiency. Pignet's formula for , estimating 
muscular strength was devised in 1901 and is reported by Martin. +° 
Oppenheimer’s scale for measuring general physiological condition 
with emphasis on nutrition is given by Williams. 44 Sargent’s 
“Physical Test of a Man” came in 1921 and was later modified by 
Schwegler and Englehardt4? at the University of Kansas. A great 
many indices, used to indicate health, build, anthropological re- 
lations and the like have been devised from time to time since Steet 


42Collins, Vivian D. and Howe, Eugene C., “The Measurement of Organic and 
Neuromuscular Fitness," Am. Phys. Educ. Rev., XXIX (February, 1924), 
64-70. 


15 Martin, E. G., "Tests of Muscular Efficiency," Physiol. Rev., I (July, 1921), 
454. 


44Williams, J. F., The Organization and Administration of Physical Education, 
Chap. 12., p. 255. New York, The Macmillan Company, 1923. 

45Schwegler, R. A. and Englehardt, J. L., “A Test of Physical Efficiency," zm. 
Phys. Educ. Rev., ХХІХ (November, 1924), 501-505. 


Development of Measurement in Physical Education 31 


in 1874 set up the Weight-Height Index. Some of these will be 
reported later. 


(B) Modern Developments in Physical Education 
Measurement 


Two basic changes characterize the difference between earlier and 
modern developments in physical education measurement. These 
changes center around the fact that current educational philosophy 
conditions contemporary measurement practices. Changes in pro- 
gram emphases have resulted in changes in measurement procedures. 
This is manifested, first, by the fact that modern test builders 
attempt to have their tests meet acceptable scientific requirements. 
The scientific movement in education has stimulated efforts to 
apply scientific procedures to the solving of educational problems 
in general and measurement in particular. Much of the work done 
in physical education measurement prior to 1925 was unscientific, 
but since that time and increasingly so today investigators in this 
field have been trained in the scientific approach to test construction. 
Use of approved research and statistical techniques in the develop- 
ment of measurement tools has improved the validity and reliability 
of available measures. The scientific attitude has also been evi- 
denced in the considerable amount of recent research directed 
toward better understanding of the nature of the variables to be 
measured, and the interrelationships among these traits. While 
much of this research tends to reveal the limitations of existing 
measures and the need for much additional research, the very 
recognition of this need is in itself an encouraging sign of progress. 

Second, modern measurement programs are characterized by use 
of a greater variety of measures in relation to a given individual or 
program. The objectives of physical education are more broadly 
conceived, Concurrent with this is a greater understanding of the 
need to appraise all program objectives, and recognition of the 
limitations of a single test or index as a program gauge. This has 
resulted in increased use of and research on such types of measure- 
ment tools as sport technique tests, knowledge and attitude tests, 
rating scales for measuring the more intangible program objectives, 
fitness measures selected in terms of appraising the total personality, 
and diagnostic tests including those of general motor and athletic 
ability. These supplement, but by, no means replace types of 


32 The Status of Measurement in Physical Education 


measures emphasized earlier, which include anthropometric, cardio- 
vascular, strength and athletic achievement tests. 

Succeeding chapters of PART I of this text are devoted to a 
description of modern tests and measurements in physical education. 
While many of those described require further study, these were 
selected because they appear to have possibility for use either in 
testing programs or research. Several of the earlier tests are likewise 
described at some length, either because of their historical contribu- 
tion, or their addition to the understanding of current work. The 
remainder of this chapter offers a brief overview of present day 
status of physical education measurements, 


No attempt is made to 
identify individual tests, 


which receive full treatment subsequently. 
This section serves only to orient the reader to the general scope 
of the field. The specifics should be reviewed in their full contexts. 
Nine groupings have been arbitrarily set up, which classify types of 
measurement according to the purposes for which they were devised. 

1. Anthropometric Measurement. The fact that anthropo- 
metric measurement was the earliest type in the profession and was 
replaced in importance, by no means indicates lack of present con- 
cern in this area. Many of the earlier tools and procedures of 
anthropometric measurement are still extensively used, although 
their function and interpretation have undergone some change. Age, 
height and weight measures, indices to predict normal weight, and 
the attempts to gauge nutritional status by body measurements and 
stature continue to receive experimental attention and practical 
application. The need to interpret such measures in terms of growth 
and development Patterns of each individual rather than in terms 
of rigid norms is now recognized. Present day emphasis on physical 


fitness has revived interest in chest and other body girth measures, 
but these, too, are to be in 


terpreted in relati tural 
patterns of the individual p їп relation to the total structura 


other factors. 

The attempt fo improve posture is Perhaps one of the oldest 
objectives in physical education, and numerous rating devices and 
scientific measurement techniques of varying practicality may be 


Development of Measurement in Physical Education 33 


found. Considering the educational import of normalcy of growth 
and development, continued use of, and research on, anthropometric 
measurement tools can be expected. 

2. Cardiovascular Tests. There has long been a need for tests 
of the cardiovascular type which can be administered by the phys- 
ical education teacher to groups of pupils for the purpose of quickly 
classifying them into one of two groups: (1) those who need an 
examination by a physician before they are allowed to participate 
in strenuous physical activities, and (2) those who can immediately 
participate in strenuous activities. Unfortunately we do not as yet 
have reliable group cardiovascular tests, but in the past years tests 
have appeared which are designed to be administered as individual 
tests by the trained physical education teacher when there is a doubt 
as to the "physical condition" of the pupil and the services ofa 
Physician are not immediately available. These tests have been 
developed out of the earlier work by Schneider and his predecessors. 
Many of them aimed toward a simplification of procedures, and 
Proposed various combinations of pulse counts or blood pressure 
gauges in relation to standard exercises. 

A significant amount of research has been conducted in this area 
in recent years. Factor analysis studies have added to the under- 
standing of components of cardiovascular function. The emphasis 
during World War II on physical fitness resulted in considerable 
experimentation on tests and indices based on cardiovascular re- 
action to strenuous exercise, some with additional packs or loads. 
The techniques developed were extensively used in college and 
Armed Service wartime fitness programs. Research endeavor has 
also been directed toward the development of index scores based 
upon actual performance; and the determining of effects of various 
rates of exercise, durations of exercise, and types of exercise on 
cardiovascular response. 

Despite noteworthy advances, the need exists for continued 
research directed toward the development of tests of this type which 
are wholly satisfactory for use in school physical education programs. 

3. Athletic Achievement Tests and Scoring Scales. Since 
interest was first shown in sports and games in the physical edu- 
cation program, attempts have been made to measure achievement 
in these activities. Athletic achievement tests as they evolved 
measured performance levels in given fundamental athletic skills, 
such as track and field type activities, throwing for distance and 


34 The Status of Measurement in Physical Education 


accuracy, kicking for distance and accuracy, and specific game skills 
including tennis serve, soccer dribble for speed, basket shooting, 
volleyball serve and many more. Standards were generally set 
arbitrarily by expert opinion. The chief limitations of these early 
empirical scoring plans were that there was often no indication 
whether the standard was minimum or maximum for the group; 
scores were not equated among the events, thus a 50 point score on 
each of two events did not necessarily stand for equivalent levels of 
performance; and increment in scoring points was not relatively in- 
creased as the performance became more difficult on the higher levels. 
These limitations have been overcome by the application of 
statistical techniques to the construction of scoring scales. The 
development in recent years of scientifically constructed scoring 
scales in many activities including track and field events, sports and 
aquatic skills, and motor fitness events, has been a major contri- 
bution to the field. Three general types of scoring scales that are 
scientifically constructed have appeared. The first utilizes T-scale 
or standard score techniques (and variations of them), which equate 
performances for a given homogeneous group on the basis of the 
variations of the scores from the mean or average of the group. 
The second type is also based on the same principle, but adds the 
factor of finer groupings by use of a coefficient plan with separate 
scales proposed for varying age, height and weight groups. The 
third plan, utilizing the increased-increment principle, allows 
greater awards as the performance becomes more difficult, and is 
applicable to heterogeneous groups. 
Achievement scales prove extremely useful to stimulate pupil 
terest, determine group and individual skill level, and indicate 
special deficiencies or strengths. The measurement of achievement 


in the fundamental skills of physical education has become an in- 
tegral part of a well-conducted program. 


4. The Classification of Pupils. 
or more physical educators have re 
equalizing the physical differences w 
the same age group. The range of si 
ability of children of a given age presents a number of problems in 
physical education, among which may be mentioned a lack of in- 
terest on the part of the smaller and weaker children, and a real 
physical hazard for these same pupils while in competition with 
those who are more mature and of superior size and strength. 


in 


For a period of thirty years 
cognized the desirability of 
hich exist among children of 
ze, maturity and performance 


Development of Measurement in Physical Education 35 


Although it might be shown that a large number of physical 
measurements have a more or less significant bearing upon per- 
formance, the factors of age, height and weight, when used in the 
proper combination, offer a simple and practical classification device. 
'The proper combination of these factors has been verified by 
several different researches, and these scientifically derived indices 
have replaced the earlier schemes which were empirically determined. 

Since it is often desirable to use other factors than age, height and 
weight for homogeneous groupings, various plans have been sug- 
gested using strength and motor ability tests and indices. Other 
types of tests, depending upon the kind of grouping required for the 
activity concerned, have also been recommended for classification 
purposes. 

5. The Measurement of General Qualities. A considerable 
number of the tests available in the field fall within this category. 
Here are grouped those tests which are used for measuring certain 
general qualities commonly known as general motor ability, capacity 
and educability, general athletic ability, neuromuscular skill capa- 
city, strength, power and the like. General qualities are contrasted 
with the specific ability required in a particular sport. 

Many well conceived and carefully constructed tests of general 
motor educability, capacity and ability, and general athletic ability 
have been developed. Motor educability refers to the facility with 
which one learns motor skills; motor capacity, to one's inherent 
capabilities to learn motor skills; motor ability, to one's acquired 
level of learned motor skills; and general athletic ability, to one's 
acquired level of learned athletic skills. These tests have evolved 
out of earlier general athletic achievement tests, and by the careful 
application of approved statistical techniques significant measure- 
ment tools have been made available. Continuous research is being 
conducted to modify existing batteries, to throw additional light on 
their potential use, and to develop new techniques. 

Considerable research hes been done in the area of strength 
testing in recent years. Attention has been directed toward the 
Standardization of procedures, the proper use of equipment, and 
the exploration of the relationships between strength and other 
qualities, Strength tests and physical fitness indices derived there- 
from have been introduced into the modern measurement program 
of physical education as measures of general athletic ability and big 
muscle activity habits, and, when used by properly qualified indi- 


36 The Status of Measurement in Physical Education 


viduals, may indicate functional vitality which in turn seems 
fundamental to health. While the average teacher must not attempt 
to prescribe health needs on the basis of findings from strength tests 
and physical fitness indices, various of these have been shown to be 
of considerable importance to the teacher as both an indicator ot 
physical condition and as a motivator. In addition, available indices 
of physical capacity and power appear to have some value as meas- 
ures of muscular potentialities required for both general physical 
achievement and achievement in vigorous team games and sports. 
6. Physical Fitness. Physical fitness, though variously defined, 
has always been one of the primary objectives of physical activity 
programs. The review of earlier test development revealed the 
continuous concern with the health aspects of physical education. 
Cardiovascular tests, anthropometric tests, strength tests, and even 
athletic performance and motor ability tests have been proposed 
to appraise either directly or indirectly aspects of physical fitness. 
The two wars have served to emphasize this objective of physical | 
education. Out of the considerable literature and research on the 
subject has evolved a concept of physical fitness which recognizes 
the term as pertaining to the total functional capacity of the in- 
dividual in relation to the specific work or task he must undertake. 
This concept recognizes the inseparability of all systems of the body, 
including the organic, the muscular and the cardiovascular- 
respiratory. It further recognizes the need for the neuromuscular 
skills and the proper motivations and attitudes essential to carry a 
task to its successful completion. Understanding this concept brings 
cognizance that all types of tests which reveal significant informa- 
ton about physical and mental functioning have import in the 
evaluation of physical fitness. For this reason all tests in this 
volume have significance in varying degrees for the appraisal ot 
physical fitness. 
А specific recent development has been the renewed interest in a 
type of measurement termed motor fitness. Motor fitness tests aim 
to measure the fitness of the body for strenuous work, and define as 
components strength, agility, speed, endurance, power and the like. 
Emphasis of wartime service and civilian fitne 
motor aspects of fitness resulted in the develo 
batteries which included such test elements a 
squat-jumps, squat-thrusts, endurance runs, 
These motor fitness tests have generally be 


Ss programs on the 
pment of many test 
s push-ups, pull-ups, 
agility runs and sit-ups. 
en used as one aspect ot 


Development of Measurement in Physical Education 37 


testing programs which also include other measures of physical 
fitness, such as medical and psychological examinations, and various 
types of skill fitness, particularly in water safety. 

7. Sport Technique Tests. The discussion of athletic achieve- 
ment tests and scoring scales revealed that measurement of the 
separate skills of games and sports was among the earlier measure- 
ment efforts in the modern program of physical education. Not until 
comparatively recently, however, have attempts been made to 
measure skill in the game itself. Sport technique tests are designed 
either to predict potential playing ability or evaluate present status 
or level of ability in the given sport. These tests are generally con- 
structed by analyzing the component skills of the sport, and then 
by statistical means selecting those few skills which best represent 
the whole, Although a considerable number of sports technique 
tests have been suggested, those for which validity have been found 
to be significant are still comparatively few. It is obvious to all that 
Success in a sport is dependent upon more than the sum of ability 
in the component skills of the game. The problem is complicated by 
the complexity of variable factors which make for success in the 
average sport, and the practical need to adhere to the criterion of 
administrative economy for tests to be used in the school program. 
Since sports now comprise the major portion of physical education 
Programs, much research interest is directed toward this phase of 
Measurement. Progress is being made, and improved tests are 
Continuously reaching the field. 

8. Knowledge and Information Tests. Within the past ten 
Years a great deal of attention has been paid to this phase of the 
Measurement program. An increasing number of health knowledge 
tests, constructed according to approved techniques, are appearing. 
Standardized health knowledge tests serve as important program 
gauges. In view of modern curriculum trends, however, locally con- 
Structed tests directed toward meaningful local objectives take 
Preference. For this reason it is important that teachers be familiar 
With the most advanced techniques of test construction in this area 
to insure carefully prepared tests designed to meet local needs. 

Knowledge tests are likewise important in the physical education 
Program. Many elements of physical education instruction includ- 
ing playing rules, team tactics, game courtesies, and officiating can 

€ measured by pencil and paper tests. There is a decided place for 
Standardized rules and knowledge tests in those physical education 


38 The Slatus of Measurement in Physical Education 


activities which are fairly uniform both sectionally and nationally. 
A number of these are available in periodical literature, but none 
has been prepared and distributed by the national test construc- 
tion agencies which supply tests in health education and most 
other school subjects. 

9. Rating Scales. Various rating scales have been developed to 
measure those aspects of behavior, performance, attitudes or pro- 
gram organization in physical education which do not readily lend 
themselves to more objective means of measurement. An excellent 
start in the development of behavior rating scales for use in physical 
education situations has been made, but much additional effort in 
this area is indicated. A basic need of the field is for cooperative 
effort among physical educators, sociologists and psychologists to 
develop adequate measurement tools for the appraisal of the complex 
social objectives of physical education. 

While tools to rate behavior and social efficiency in physical educa- 
tion are still few, the tremendous advances in the field of general 
educational measurement in the past several decades provide valu- 
able resources for the physical educator for both guidance and re- 
search purposes. Use of these tools requires preparation in addition 
to that normally given in physical education measurement, however. 


Selected References 

BunroN-Orrz, R.: “Tests of Physical Efficiency," Am. Phys. Educ. Rev., XXVII 
(April, 1922), 153-159. 

An account of the change from tests of strength to cardiovascular tests in 
rating bodily efficiency. 

Larson, LEONARD A. and Cox, WALTER A.: “Tests and Measurements in Health 
and Physical Education,” Suppl. to Res. Quart. Am. Assoc. Jor Health, Phys. 
Educ., and Rec., Vol. 12, No. 2 (May, 1941), 483-489. 

Discusses the nature and scope of measurement in he: 
education, and the administration of the program. 

Powe tt, ELIZABETH: “The Present Status of Physical Indices,” Res. Quart. dm. 
Assoc. for Health, Phys. Educ., and Rec., Vol. XI, No. 2 (May, 1940), 3-17. 

Presents a brief, concise review of the scope of physical education measure- 
ment including references to specific tests and related researches. 


SARGENT, Duprey A.: “Twenty Years Progress in Efficiency Tests," dm. Phys. 
Educ. Rev., XVIII (October, 1913), 452. e 


This article treats particul. 


alth and physical 


arly of the change from anthropometric measure- 
ments to strength tests and points out the philosophy back of the change. 
Seaver, Jav W.: Anthropometry and Physical Examination. Meriden, Conn., 
The Curtiss-Way Company, 1909. Pp. 191. Chapter I, “History of 
Anthropometry.” 
An excellent discussion of Greek stand 
Roman and Egyptian canons, 
anthropometry. 


ards with regard to anthropometry, 
modern proportions, and the purpose of 


CHAPTER III 


A nthrop ometric Mea surements 


Hitchcock's Contribution. Dr. Edward Hitchcock at Amherst 
has the distinction of occupying the first chair to be established in 
physical education. As soon as the department was organized (1861), 
he set about to put physical education on a scientific basis. His 
contribution lies mainly in the field of anthropometry, though he 
was always eager to cooperate with any group for the advancement 
of physical education. During practically every year from 1861 to 
1901 his anthropometric tables of Amherst College men were 
published — sometimes half a dozen of them, showing height and 
weight of men at various ages and over various periods of time in all 
four classes of the college. These reports and tables comprise two 
large volumes. The trend of the times may be brought out by 
quoting from the report of the Committee on the Method of Phys- 
ical Measurements read by Dr. Hitchcock to the Association for the 
Advancement of Physical Education, November 26, 1886. 

The ultimate and philosophical aim of anthropometry is to ascer- 
tain the ideal or typical man and this must be the result obtained 
before we can do our best work ....We want to be able to tell norm- 
ally developed people what deficiencies they have, if any, and how 

eir best development can be brought about. And what shall be 
the character or standard we use in this work—shall it be age, weight 

eight or some other basis? А : y 

. We cannot use age as the standard since so many different weights, 
sizes, and powers are to be found at the same age, and what might 
be adopted for the person of twenty-five would be inadequate tor 
the youth of fifteen. So also weight will be found wanting as a stand- 
ard for these determinations, and for similar reasons. 


39 


40 The Status of Measurement in Physical Education 


And so will any characteristic if in our present state of anthropo- 
metric knowledge we assume an absolute or perfect standard. But 
the height of an individual, as seen by the rules of common sense, 
seems to have a stronger claim for a basis of proper proportions than 
can any other principle, so far as now understood. And the law of 
beauty confirms this view, since for the longer perpendicular lines 
of the body there must be larger corresponding girths in order to 
give the proper and graceful curves of trunk and limbs than for the 
shorter ones. The physiological fact has long ago been settled that 
lung capacity has a fixed ratio to body height, and military men well 
know that the soldier whose chest and abdomen (sitting height) are 


the longest in proportion to the legs, will have greater physical en- 
durance than the reverse. 


An advanced idea in anthro 


pometry then may be to furnish a 
series of charts or tables of diffe 


rent heights, say a centimeter or half 
inch apart, by which any person may see at a glance whether he 


corresponds or not in certain bodily measurements and tests to the 
average which these charts represent. 


As a matter of record, this is exactly what Hitchcock did for his 
students at Amherst. A chart was given to every man, showing 
what the average measurements were for his height and age, and in 
another column was posted, on this chart, by the examiner, the 
student’s own individual measurements, In his twenty-fifth annual 
report, Hitchcock goes on to say, “And now that we have these data 
of a quarter of a century, and these facts by the thousands, we тау 
be able to pronounce with more than the air of probability some of 
the averages of a college student, and tell him what he may expect 
to measure, weigh, blow, lift, push or do if he be an average college 
student and wants the ‘mens sana in corpore sano? We have now 
gone so far with the student that when we have given him his first 
physical examination after entrance, we are able to furnish him with 
a chart by which he may know at a glance whether he is at, near or 
below his estimated average so far as measuremenís, tests, and 
examinations will show." 

Sargent's Contribution—An Anthropometric Chart. Sar- 
gent’s compilation of the measurements of students examined by 
him at Harvard University beginning in 1880 resulted in the pub- 
lishing of an anthropometric chart in 1886 such as is shown in Fig. 
1, By this means he endeavored to determine a physical standard 
for the American college student by which the student could compare 
himself with the entire group. These measurements included not 
only age, weight, height, girths, breadths and depths but also lung 


Anthropometric Measurements 


DATE 


AGE 


we 4 ko азб ce саңа, ж зајак 


DEVELOPMENT 
VISION 
HEARING 


WEIGHT 
WEIGHT 


кз, 
шр 
Pubie Arch 
Sterns 


GIRTH. Head 


Nech 
Chast. Repose 
۰ 

кын 
шр» 
ату 
Tos 
[rl 
nes 

R- Catr 
pam 
R.lnetep. 
we 
RUpper Arm 
bee 
R.Elbow 
L^ 

R. Forera 
Da 
[Sr] 
n" 


DEPTH. Cheat 


Abdomen 


BREADTH. Head 


Nech 


Horizont) 


STRETCH of Arms 
CAPACITY of Lungs 
STRENGTH ef Linge 


Back 
“б 


tara 


tove] q5 | Go | 80 | то | GO} 50 до | зо | 20 | 10 | 5 [ло 
Belew] 5 10 | 20] зо] љо | so | ео | 70 | зо | 90 95 |892 
: ГҮР 
H | ш Н 
Н 1-4-1 z 
"a П 
-E m LLEH 
-47 H 11| 
Li ың HE 
ENES ERE - 
Г d 
(ES ES ILE ы КЕЗЕ: SES 
= T 
—— 


Fig. 1. Anthropometric chart. (Sargent in Scribner's Magazine, 


41 


July, 1887.) 


42 The Status of Measurement in Physical Education 


capacity, strength of back, legs and upper arms. This chart, like 
the others brought out subsequently, listed these measurements 
according to percentiles. The more symmetrical the individual, the 
closer he approached to the 50 percentile. Techniques similar to 
Sargent’s are used today to show an individual where he stands in 
relation to norms for an entire group in some of the tests which are 
in common use and which include many of the elements of play 


Fig. 2. 


Apparatus formerly used at Univ, 
measurements on men and to show posture. (Ph 


ersity of California to get 


olo by courlesy of Professor 
Frank Kleeberger.) 


activities such as running, jumping, throwing, climbing, vaulting 
and the like. 

Other Anthropometric Chart Contributions. 
years of 1885 and 1900 there was a great deal of activity along the 
lines of anthropometric charts. Seaver charted the records of 2700 
Yale students; Gulick, of men between the ages of ty 


venty-five and 
thirty-five; Wood of Wellesley tabulated the measurements of 1500 


Between the 


Anthropometric Measurements 43 


students and issued a table in percentile form; Hanna of Oberlin 
issued a similar table compiled from the records of 1600 female 
students; and Clapp of the University of Nebraska tabulated and 
published a chart of the anthropometric records of 1500 women of 
that university who had been measured by her." 

Special Instruments for Measuring Postural Conditions. 
Seaver describes a number of special instruments which were brought 


Fig. 3. Table on which subject stands while photo is being taken, This 
can be turned and locked in various positions. (Photo by courtesy of 
Professor Frank Kleeberger.) 


forward in the nineties to demonstrate anatomical and physiological 
facts. These include instruments for measuring the amount of the 
pelvic tip, for showing the exact contour of the chest, for tracing the 
anteroposterior depths at all points of the trunk, and for recording 
outlines of the body and abnormalities of spinal curvature. 


1A rather complete account of these charts may be obtained from: Jay W. Seaver, 
Anthropometry and Physical Examination, Chapter VII. 


44 The Status of Measurement.in Physical Education 


Currently used posture measurement instruments include such 
devices as x-ray, posturemeter,? conformateur, 4 pedorule, 5 
pedograph,9 scoliometer,".$ and various types of photography. 


Fig. 4. Photographs in which scales of measurement are readily seen and 
cross lines are in place for reference to postural defects. (By courtesy of 
Professor Frank Kleeberger.) 


?McCloy, Charles H., "X-ray Studies of Innate Differences in Straight and 
Curved Spines,” Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 
IX, No. 2 (May, 1938), 50-57. 

3Buhl, Olga Anderson and Morrill, Warren P., “The Measurement of Postures,” 
Res. Quart. Am. Assoc. Jor Health, Phys. Educ.; and Rec., Vol. 12, No. 3 (October, 
1941), 518-527. 

4Cureton, Thomas К., Jr., “Bodily Posture as an Indicator of Fitness," Suppl. 
to Res. Quart. Am. Assoc. Jor Health, Phys. Educ., and Ree., Vol. 12, No. 2 (May, 
1941), 348-367. ‘ 
5Danford, Harold R., “A Comparative Study of Three Methods of Measuring 
Flat and Weak Feet,” Suppl. to Res. Quart. Am. Phys. Educ. Assoc., Vol, VL 
No. 1 (March, 1935), 43-50, 
®McCloy, Charles H., Tests and Measurements in Health and Physical Education, 
p. 272. New York, F. S. Crofts and Co., 1942, 


Anthropometric Measurements 45 


While motion pictures have not been widely explored as a means of 
posture analysis, their potentialities have received some recogni- 
tion.9:10 Extensive use is made of photographs and silhouettes. 
Modern photographic techniques have simplified procedures for 
taking both still photographs and silhouettes, bringing the use of 
photography for posture analysis within the reach of every school 
and college.!! A common practice is to take the pictures against a 


7 GOOD POSTURE FAIR POSTURE POOR POSTURE VERY POOR POF 


Fig. 5. Posture silhouette photographs. Types found among college 
women. Part of entering physical examination. (By courtesy of the Depart- 
ment of Physical Education, University of Southern California.) 


"Fitz, George W., “A Simple Method of Measuring and Graphically Plotting 
Spinal Curvature and Other Assymetrics by Means of a New Direct Reading 
Scoliometer,” American Physical Education Review, Vol. XI, No. 1 (March, 
1906), 18-24. 

8Clarke, H. Harrison and Shay, Clayton T., “Measurement of Lateral Spinal 
Deviations,” Black and Gold of Phi Epsilon Kappa, Vol. XVII, No. 2 (March, 
1940), 38-42. 

?Bass, Ruth, "A Study of the Mechanics of Graceful Walking," Res. Quart. dm. 
Phys. Educ. Assoc., Vol. VIII, No. 2 (May, 1957), 173-180. 

10Cureton, Thomas K., Jr., “Elementary Principles and Techniques of Cine- 
matographic Analysis,” Res. Quart. dm. Asso. for Health, Phys. Educ., and Rec., 

Vol. X, No. 2 (May, 1939), 3-24. 

"Kelly, Ellen, “Taking Posture Pictures,” Jr. of Health and Phys. Educ., Vol. 17, 

No. 8 (October, 1946), 464—465, 


46 The Status of Measurement in Physical Education 


calibrated screen to facilitate measurement and appraisal. (See 
Fig. 2, 3, and 4.) Photographs and silhouettes are used for motiva- 
tional purposes, for various subjective methods of posture analysis, 
and as the basis for several objective techniques of posture measure- 
ment, some of which will be described. 

The Brownell Scale for Measuring Anteroposterior 
Posture.!? In order to eliminate the use of expensive apparatus 
and reduce the time element to a minimum so that the scale may be 
applicable to school use, the quality of posture in Brownell’s study 
was judged on the basis of the individual’s entire profile or silhouette. 
A random sampling.of 100 silhouettes was arranged in order of merit 
by a large group of experts or judges. By the use of appropriate 
statistical techniques, a scale of thirteen silhouettes was finally 
arranged from this random sampling. The scale samples (silhou- 
ettes), arranged in rank order, were then transmuted into units 
of amount or scores, and are presented on a chart. Below each 
silhouette is listed the scale score for that particular quality of 
posture. 

In using the scale the teacher compares the silhouette of the 
individual to be graded with each type of posture shown on the scale. 
It is suggested that the comparison be made by starting from the 
bottom of the scale and working up; then beginning at the top end 
of the scale and working down. An average of the two comparisons 
may be taken as the individual’s posture grade. This is much the 
same type of procedure as would be followed in using a handwriting 
scale, 

Postural Measurement of the Pre-School Child.!3 Crook’s 
scale presents a series of silhouettes of pre-school children arranged 
in order of merit by fifty judges. The procedure followed is quite 
similar to that developed by Brownell.!4 Each of 100 silhouettes 
was ranked by the judges from 1 to 100 and the value of the thirteen 
finally selected for the scale determined by the percentile position 
of gross scores. The value of each sample was transmuted from a 
rank order into units of amount. In using the scale, the silhouette 
1? Brownell, Clifford, L., 4 Scale for Measuring the Antero-Posterior Posture of 

Ninth Grade Boys. Teachers College Contributions to Education No. 525. New 

York, Bureau of Publications, Teachers College; Columbia University, 1928. 


13Crook, Billie Louise, “А Scale for Measuring the Antero-Posterior Posture of 


the Pre-School Child,” Res. Quart. Am. Phys. Educ. Assoc., Vol. VII, No. 4 
(December, 1936), 96-101. 


14Brownell, Clifford Lee, op. cit. 


س 


Anthropometric Measurements 47 


to be judged is moved along the scale until the type of posture most 
similar in quality is found. The posture grade is found below each 
of the thirteen types.. | , 

Korb's Comparograph.!* Korb presents a method to increase 
the validity of judging silhouettes by use of the Comparograph. 
The norm is a composite silhouette outline, based on the examination 
of 2200 subjects. The outline of the norm posture is placed on 
а curtain against which the subject's silhouette is made. The 
subject’s silhouette is then compared with the adjacent outline. 
Standards are given on the basis of which an A, B, C, or D rating 
can be made. : 

MacEwan-Howe Posture Measurement. In this method 
of grading posture an antero-posterior photograph (not a abes 
is taken of the subject and from the photograph three measurements 
are calculated which, when combined and properly weighted, repre- 
sent the posture grade, expressed numerically or by letters. In order 
to make these calculations it js necessary to prepare the subject 
before she is photographed by affixing to the skin a number of light 
aluminum pointers. One of these is located at the end of the MO 
nine are located on spinous processes in the cervical, thoracic = 
lumbar regions, and one on the prominence of the first piece of t e 
sacrum. Since the length of the pointers 1s known, the true postion 
of the chest and spine can be drawn on the photograph despite pro- 
Jecting sca , arm and breast. 

The va dnce to be determined are: (1) the amount ia 
antero-posterior curvature in the dorsal and lumbar spine, ele e 
amount of segmental angulation and body tilt, and (3) the mye 
of the head and neck. When certain definite points are locate E 
the photograph by means of a dissecting needle, the реше ы 
under a transparent triple scale and measurements are read directly 
Оп the scale. 

The co used to establish the validity of these ibis з. 
as indicators of good posture consisted of the composite а о ae 
judges who were specialists in the field of posture. When i е! ae E 
upon which each judge based his opinion were correlated wi 


idity of Measuring 
"Korb vi in, “A Method to Increase the Validity 
Pate ен Am. Assoc. for Health, Phys. Educ., and Rec., Vol. X, 


No. 1 (Mz 39), 142-149. , кя 
А BL Coates and Howe, Eugene C., "An eps n 
Grading Posture,” Res. Quart. Am. Phys. Educ. Assoc., Vol. III, No. 5 > 


1932), 144-157. 


48 The Status of Measurement in Physical Education 


posture grade, it was found that four factors were prominent and 
from these the three measurements mentioned above were derived. 
The multiple relationship (R = .812) between the criterion and the 
sum of the three measurements appears to be of a reasonable order 
for this type of study. 

The cost of the photographic apparatus and equipment and the 
running expenses is not excessive for institutions interested in this 
type of measurement. This cost, however, must be augmented by ad- 
ministrative expenses and a capital outlay for a photographic room. 

Springfield Postural Measurements. Cureton and associates 
at Springfield College!? have done a considerable amount of experi- 
mental work in the development of techniques for objectively scaling 
various aspects of posture: (1) head thrust forward, (2) head tilt, 
(3) forward shoulders, (4) hip thrust forward, (5) hyperextended 
knees, (6) abdominal protuberancy, and (7) body lean. 

A conformateur apparatus for measuring spinal curves, used by 
a number of early investigators, has been greatly improved and by 
means of special lighting effects the front tip of the shoulder, the 
tragus of the ear, and the greater trochanter of the femur can now 
be located on a silhouette. On the whole, objective measurements 
are four times as good as a subjective inspectional scheme. 

Because of the belief by many students of body mechanics that 
an analysis of posture in the upright position should begin with a 
consideration of the center of gravity, Cureton and Wickens 18 have 
developed a static center of gravity test “which is definitely related 
to posture, strength, physical fitness and athletic ability." 19 

The subject to be tested is placed on the center of а board, 144 cm. 
long, each end of which rests on a balance scale. The internal 
malleoli are lined up with the center of the board. By means ot 
certain calculations from the readings on each scale and the distance 
between board supports, a center of gravity can be determined. 
This may be transposed into a percentile score. . 
17Cureton, Thomas K., Jr., “Bodily Posture as an Indicator of Fitness, 

Cureton, Thomas K., Jr., Wickens, J. Stuart and Elder, Haskell P., "Reliability 


and Objectivity of the Springfield Postural Measurements,” Suppl. to Res. 

Quart. Ат. Phys. Educ. Assoc., Vol. VI, No. 2 (May, 1935), 81-92. 

Cureton, Thomas K., Jr. and Wickens, J. Stuart, “The Center of Gravity of the 

Human Body in the Antero-Posterior Plane and Its Relation to Posture, 

Physical Fitness, and Athletic Ability,” Suppl. to Res. Quart. Am. Phys. Educ. 

Assoc., Vol. VI, No. 2 (May, 1935), 93-105. 
18Cureton, Thomas K., Jr., and Wickens, 
19Tbid., р. 105. 


' op. cit. 


J. Stuart, op. cit. 


Ё 


рч 


Anthropometric Measurements 49 


Such a test may be very useful in a number of studies, as for 
example, the effect of high heels on the carriage of body weight, and 
the relationship between foot defects and weight balance. 

Wickens and Kiphuth’s Posture Measurement.?? Wickens 
and Kiphuth’s posture measurement is based upon the scaling of 
angles on an antero-posterior photograph, and combines several of 
the features of the MacEwan-Howe?! and Springfield?? studies. 
To increase the accuracy of the scaling procedure the subject is 
marked in the following manner previous to photographing: marks 
are made with a black flesh pencil on the tragus of the ear, front tip 
of the shoulder, acromium, greater trochanter of the femur, styloid 
process of the fibula, and center of the external malleolus; and 
aluminum pointers are attached with adhesive tape to the 7th 
cervical vertebrae, the greatest convexity backward of the dorsal 
curve, the most prominent part of the sacrum, and the end of the 
sternum. Final preparations are made for the photograph by placing 
the subject on footprints painted on the floor, with feet adjusted so 
a plumb bob falls through the external malleolus. 

From the picture angles are computed to determining the relative 
alignment of head and trunk, chest, abdomen, shoulders, trunk, hips 
and knees, and the amount of kyphosis, as well as the lordosis angle. 

Considerable effort was made to validate this method. Some 
Coefficients reported include a combined reliability and objectivity 
coefficient for head and neck: r=.721 + .054; kyphosis: r—.8542- 
-034; and lordosis: r=730+.053. These correlations were made 
оп the basis of photographing and rephotographing subjects with 
Pointers affixed by two different examiners. With duplicated sets 
of one hundred pictures graded by two different examiners, the two 
judgments intercorrelated as follows: head and neck: г=.962-Е 
:006; kyphosis: г=.956-.007; and lordosis: r=.966. +006. 

Use of Subjective Ratings. Difficulties in objectively measuring 
Posture lie not only in the inherent inaccuracies of the measures 
themselves, but also in the inability of the subject to assume exactly 


20W; i scar W., "Body Mechanics Analysis of Yale 
Ц ере Eon “Im. Assoc. for Health, Phys. Educ., and 
Rec., Vol. VIII, No. 4 (December, 1937), 58-48. 
ickens, J. Stuart and Kiphuth, “Common Postural Defects of College Fresh- 
men," Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 13, No. 1 
г (Мак, 1942), 102-108. Pech. 
Edu tin CA E I Stuart and Elder, Haskell P., op. cit. 


50 The Status of Measurement in Physical Education 


the same position twice. In addition, the need to consider postures 
as well as posture further complicates the picture. Consequently in 
practice considerable emphasis is placed at present on various types 
of posture screening which subjectively rates many aspects of pos- 
ture, including body alignment in motion.?? The worth of this type 
measure depends upon the degree to which it meets the criteria of 
good rating scales, namely, carefully defined elements, expertness 
of the rater, and lack of bias of the rater. Cureton has concluded 
that posture can satisfactorily be judged by experts adequately 
trained in principles of body mechanics.?* The worth of this type 
examination is further enhanced when preliminary screening is 
subsequently followed up by more thorough examinations, including 
review of posture problems in relation to the individual's total 
health picture. Photographs and silhouettes are often used for 
motivational purposes in connection with this type of posture 
examination. 

Foot Measurement. The objective appraisal of foot conditions 
also presents difficulties. The validity of measures of footprint 
angles to determine foot function and condition has been found to 
be low.?5 Flat-footedness, whether measured by computing angles 
on а pedograph print,?9 or by use of the pedorule as proposed by 
Danford?? may ог may not indicate poor foot condition. 

Extensive studies on foot fitness have been conducted by Cure- 
ton.?8 He proposes a three step appraisal which includes: СО) а 
questionnaire on foot history, (2) an inspection and manipulation 
23McCloy, Charles H., Tests and Mea 

“Towa Posture Test,” pp. 259-263. 


Phelps, W. W. and Kiphuth, R. J. H., The Diagnosis and Treatment of Postura 
Defects, Chapter V. Baltimore, Charles C. Thomas, 1932. 

Rathbone, Josephine L., Corrective Physical Education. (Fourth Edition), 
Chapter V. Philadelphia, W. B. Saunders Company, 1949. 

Scott, M. Gladys and French, Esther, Better Teaching Through Testing, “Rating 
of Posture,” pp. 158-166. New York, A. S. Barnes and Company, 1945. 
Sehon, Elizabeth L., et al., Physical Education Methods for Elementary Schools. 
“Posture Testing,” pp. 48-58. Philadelphia, W. B. Saunders Company, 1948, 

?4Cureton, Thomas K., Jr., op. cil., p. 365. 

?5Clarke, Н. Harrison, “Ап Objective Method of Measuring the Height of the 
Longitudinal Arch of the Foot," Res. Quart. Ат. Phys. Educ. Assoc., Vol. IV, 
No. 3 (October, 1933), 99-107. 

Cureton, Thomas K., Jr., “Fitness of Feet and Legs," 
Assoc. for Health, Phys. Educ., and Rec., Vol. 12, No. 2 
26Clarke, Н. Harrison, op. cit. 

27Danford, Harold R., op. cit. 

?8Cureton, Thomas K., Jr., op. cit. 


surements in Health and Physical Education. 


” Suppl. to Res. Quart. Am. 
(May, 1941), 368-380, 


Anthropometric Measurements 51 


of the feet with particular alertness for signs of poor flexibility, 
weakness or strain, and (3) a functional efficiency test consisting of 
five items, namely, vertical jump, center of gravity test, toe flexion 
strength as measured by portable scales, scaphoid deviation as 
measured by a pronation ruler, and the height of the scaphoid bone 
measured in centimeters. 

The wide prevalence of foot disorders indicates need for educa- 
tional concern with the matter. In the absence of wholly effective 
objective measures, attention should be directed toward supple- 
mental use of subjective examinations. Stafford, 29 Kelly, 3° 
Rathbone?! and Sehon et al.32 describe criteria for judging foot 
conditions. 


Anthropometric Measurements for Determining 
Nutritional Status: 

Though the idea of appraising the health status of the individual 
is not always expressed in studies involving anthropometric measure- 
ments, it is quite evident that in some studies of this nature an 
attempt is made to produce measurements which may have a bearing 
on the nutritional status of the individual. For a period of years 
the age-height-weight tables of the American Child Health Association 
were used in determining nutritional status but were discarded when 
the whole question was more carefully considered in the light of our 
knowledge regarding individual differences in body build or type. 
Despite limitations as a nutritional gauge, the periodic and cumula- 
tive record of age, height and weight for the individual pupil serves 
an important function in health evaluation, and as such these 
measures are still recommended and widely used. Present practices 
in use of age, height and weight measurements emphasize inter- 
Pretation of these measures in terms of the growth pattern of the 
individual pupil. Tables XXXVI through XXXIX in the Appendix 
show growth patterns in normative zones for boys and girls between 
the ages of, four and eighteen. In college males the deviation of 
actual weight from predicted weight is thought to have considerable 
significance in relation to health status. 

Preventive Physical Education, p. 184. New 


?9Stafford, George Т., Corrective and 


York, A. S. Ba and Company, 1928. h ў 
*oKelly, Ellen, “Dynamic pus of Foot Postures During Walking," Res. Quart. 


Am. Phys. Educ. Aesoc., Vol. VIII, No. 2 (May, 1957), 152-145. 
5, Rathbone, Josephine L., Chapter II, op. cit. 
?Sehon, Elizabeth L., ор. c't- 


52 The Status of Measurement in Physical Education 


Since 1929 use has been made of instruments for measuring the 
relative amount of fat just under the skin because of the fact that 
the amount of fat on the body was closely related to nutritional 
conditions. More recently it has been shown that a few simple 
anthropometric relationships are extremely valuable in determining 
nutritional status. 33 

McCloy 3* recommends as routine in the school health examina- 
tion the following anthropometric measurements: height, weight, 
hip width, chest circumference, leg girth, width of elbow or knee, 
and girths of upper arm, forearm and thigh. Meredith and Stuart35 
suggest as minimum routine measures: height, weight, hip width, 
chest circumference, leg girth, and subjective ratings of thickness of 
two selected folds of the skin and subcutaneous tissue, namely above 
the crest of the left ilium, and below and lateral to the inferior angle 
of the scapula. Selected percentile tables for five body measurements 
have been devised for boys and girls five to eighteen years old. 

The ACH Index of Nutritional Status. 
thorough analysis of measurements of a larg 
in seventy-five cities in the United States, it was found that certain 
measures had no relation to nutritional status and that the measures 
which offer an adequate Picture of soft tissue in relation to skeletal 
build are: (1) hip width, (2) chest depth, (3) chest width, (4) height, 
(5) weight, (6) arm girth, and (7) subcutaneous tissue over the 
upper arm. “With these measures it is possible to find the children 
who are lowest in three important respects, namely, arm girth for 
skeletal build, subcutaneous tissue for skeletal build and weight for 
skeletal build."37 Since the use of seven me 


36 Following a very 
e number of children 


asures is rather cumber- 


33Allen, Ross L., “Weight Deviation and Health in a College Group,” Res. Quart. 
Am. Phys. Educ. Assoc., Vol. VII, No. 3 (October, 1936), 89-98. 

34McCloy, Charles H., Appraising Physical Status: The Selection of Measure- 
ments. University of Iowa Studies in Child Welfare, Vol. XII, No. 2 Iowa City, 
State University of Iowa, 1936. 
McCloy, Charles H., Appraising Physical Status: Methods 
sity of Iowa Studies in Child Welfare, Vol. XV, No. 2, Iow. 
sity of Iowa, 1938. ^ 

35Stuart, Harold C. and Meredith, Howard MS “ 


Use of Body Measurements in 
the School Health Program," 4m. Jr. of Pub. Health, Vol. $6, No. 12 (December, 
1946), 1365-1585. 


39Franzen, Raymond and Palmer, George T., The ACH Index of Nutritional 
Status. New York, The American Child Health Association, 1934. 


Franzen, Raymond, Physical Measures of Growth and Nutrition. New York, 
The American Child Health Association, 1929, 


37Franzen, Raymond and Palmer, George T., op. cit., p. S. 


and Norms. Univer- 
a City, State Univer- 


E. 


Anthropometric Measurements 53 


some, it was finally decided that three measures offered a simple 
combination to select children whose nutritional status should be 
studied by physicians. According to standards set, the ACH Index 
selects from 5 per cent to 25 per cent of the children and this per- 
centage will contain nearly all of the extreme defect cases to be 
referred to physicians. The following measurements are taken to 
determine the ACH Index — girth of upper arm (A), depth of chest 
(C), and width of hips (H), and should be recorded in centimeters 
and tenths. The girth of the upper arm consists of the sum of the 
girths with arm flexed and arm relaxed, while the chest depth 
measurement is the sum of the measurements for both inspiration 
and expiration. By means of tables showing minimum difference 
between arm girth and chest depth for a given width of hips, cases 
are selected which are quite certain to be deficient in nutritional 
Status. 

The Wetzel Grid Technique. ?* Wetzel has developed a 
method of evaluating physical fitness based upon the application of 
age, height and weight measures to a scaled grid. The grid includes 
Seven principal channels. The crosswise axes cover all ranges of 
body build from extremely thin to obese. Lengthwise the channels 
accommodate development from infancy to maturity. 

To obtain grid ratings age, height and weight are plotted on the 
Brid, from which direct estimates are made of body build, physical 
Status, relative age advancement, nutritional grade and develop- 
mental level, Successive observations supply а record indicating 
not only if progress of the individual child’s growth and development 
18 satisfactory, but also progress which may be expected in the future. 

ormal development proceeds jn a channel on the grid. A shift in 
Channel indicates a change in physique. Schedules of developmental 
Progress called auxodromes indicate expected age developmental 
evels, 


In validation of the method, grid ratings of 2,093 school children 
Terms of Physique, Development and 


38 Wet. ir А ` Н 
etzel, Norman C., “Physical Fitness ın erms of No. 12 (March 22, 1941), 


asal Metabolism," Jr. dm. Med. Assoc. 
EE 1 Condition of Children," Jr. of 
etzel, N a “ ing the Physical Condition of mar ren," Jr. о 
Du eis e MOST. 1915), 82-110; No. 2 (February, 1945), 
208-225; No. 3 (March, 1943), 329-561. 


: i Assessment of School 
еге], № an C., "The Simultaneous Screening and 

hildrén,” Jr. of eon Z Phys. Educ., Vol. 13, No. 10 (December, 1942), 
576-877, 622 


54 The Status of Measurement in Physical Education 


compared with physicians estimates were found to be 94 per cent in 
agreement, except for those children rated fair, and upon whom 
physicians had difficulty in agreeing themselves. The grid revealed 
94.5 per cent of poor or borderline cases. 

Increased use is being made of the grid technique for the examina- 
tion of school children. While original plottings can be made by the 
teacher, caution must be utilized in the unqualified observer render- 
ing judgments. This method, however, proves useful in the school 
health examination program when used under the supervision of a 
physician. 

Pryor’s Width-Weight Tables. In an endeavor to find a 
measure which, when considered with height, weight and age, would 
prove valuable in determining the nutritional status of individuals, 
Pryor took careful measurements of adolescent boys and girls every 
six months over a four-year period. She has shown that bi-iliac 
diameter or width of the pelvic crest is a reliable measure for use in 
predicting body build during the period of most rapid growth and 
that, because of the extreme ranges of body build, it is desirable to 
offer seven normal weights for each height and age depending upon 
the width of the iliac crest. From measurements taken on a wide 
sampling of children and adults, width-weight tables have been set 
up for various age ranges. These offer a much more accurate esti- 
mate of nutritional status than can be determined from the usual 
height-weight measurements. 

Indices of Stature and Build. Numerous indices have been 
used from time to time in physical education to indicate measures 
of health, build, robustness, anthropological relations and the like. 
Though they are not now commonly employed, they may be valuable 
for comparing the bodily relationships of individuals. Their use as 
indicators of physical efficiency or physical capacity is exceedingly 
limited. For example, a high sitting-standing index (stature index) 
indicates a long type of body for the particular individual in question 
and may indicate comparatively large vital organs, but, of two 
persons having the same index, one may be a 6-footer and the other 
only 5 feet 2 inches in height. The same is true of all the other 
indices, Individuals of the same height or weight may be compared, 
but there comparison ceases. Then too, there is practically no 


39Pryor, Helen B., Width-Weight Tables. For Boys and Girls 1 to 17 Years; For 
Men and Women 13 to 41 Years. Stanford University, California, Stanford 
University Press, 1940. 


Anthropometric Measurements 55 


indication that a high index of any sort has a very decided correlation 
with efficiency in the use of the body insofar as skill is concerned 


" g Weight 
L^ apes ght |. 
The best index of build (Height? 


100 Xsitting height. 


2. Ty А e em 
ype of stature index Total height 
100 x weight. 


ЧЕР ВИТ Les V МЕЕ 
5. Ponderal index*? = Height 


Weight in kilograms , 
Height in centimeters 


. Chest index (Montessori’s thoracic) = Depth of chest, 


4. Height-weight index *? 


a 


д Vital capacity 

6. Vit ET NERA чыгыы A 
Vital index Weight 
Height 


Р Weight 
8. Stature index *9 = Teigh? 


7. Vital index*? = 


9. Respiratory-height coefficient** — 


Lung capacity X chest expansion 
Height 


10. Barnhardt's index ^8 is Weight in kilograms = 
Chest circumference X height, 
240 


“Davenport, C. B., “The Best Index of Build,” dm. Stat. Assoc. Pub, XVII 
(September, 1920), 541. 
Montessori, Maria, Pedagogical 

ompany, 1913. 

Tide Y 913 


Anthropology. New York, Frederick A. Stokes 


4 


4 (Devised by S 7. 
у Steet, 1874. НЕР 
“'DeBusk, B "aU s Weight, Vital Capacity and Retardation, Ped. Sem., 


XX (М, T 
* V ee el Howe, Eugene C., “A Preliminary Selection of Tests of 
Fitness Vim. Phys. Educ. Feo, XXIX (December 1924), 568. 
entioned by C. B. Dav rt, op. cit; used by Bardeen. — , 1 
Williams ] E. Principles of Physical Education, p. 109. Philadelphia, W. В. 


eno Company, 1927. 


56 The Status of Measurement in Physical Education | 


11. Weight prediction formula (women).*? The weight of college 
women may be predicted from the regression equation: 


X; = 9.4X2 + 8.5X3 + 1.7X 4 — 214.84, where 
Хі = weight 

Хә = pelvic breadth 

Хз = shoulder breadth 

X4 ==, height. 


12. Another regression equation for the prediction of the weight 
of college women 15°0*; 


Weight — 2.6 X sum of measurements of height in inches, 
chest depth and chest width in centimeters—154.5. 
18. Weight prediction formula for eighteen year old college 
' males?!; 
Predicted weight = 1.203 X chest girth expanded (in inches) 


+ 1.168 X biceps girth contracted (in inches) + 2.654 X hip width 
(in centimeters) — 19.006. 


Studies Involving Body Build or Type. The development ot 
stature indices?? involving anthropometric measurements has 
stimulated a number of researches in regard to the relation between 
body build or type and performance. Wertheimer and Hesketh53 
have shown that type of physical architecture is related to mental 
capacities and peculiarities. Cossman and others at the University 
of Oregon have made some unpublished studies to show a positive 
relationship between types of build and athletic ability. Cozens?* 
has pointed out that there are significant differences in performance 
ability among classification groups of college men based on height 


49Highsmith, J. A. and Sorenson, D., “А Tentative Weight-Prediction Formula," 
Am. Phys. Educ. Rev., XKXIII (September, 1928), 448-450. 

5°Ludlum, F. E. and Powell, Elizabeth, "Chest-Height-Weight Tables for College 
Women," Res. Quart. dm. Assoc. for Health, Phys. Educ., and Rec., Vol. XI, 
No. 3 (October, 1940), 55-58. 

See Margaret B. Craig, Selected References, for a comparison of several other 
similar and recent equations, and also: Turner, Abby H., "Body Weights 
Optimal for Young Adult Women," Res. Quart. dm. Assoc. for Health, Phys. 
Educ., and Rec., Vol. 14, No. 3 (October, 1943), 255-276. 

51 Allen, Ross L., “Weight Deviation and Health in a College Group," Res. Quart. 
Am. Phys. Eauc. Assoc., Vol. VII, No. 3 (October, 1936), 89-98. 

5? Montessori, Maria, Pedagogical Anthropolgy. New York, F. A. Stokes, 1913. 
Туре of Stature Index. 
Davenport, C. B., op. cit. 

53Wertheimer, F. I. and Hesketh, Florence E., “The Significance of the Physical 
Constitution in Mental Diseases," Zfedicine, Vol. V, No. 4 (November, 1926), 
375-462 

54Cozens, Frederick W., Achievement Scales in Physical Education Activities fo. 
College Men, pp. 9-12. Philadelphia, Lea and Febiger, 1936. 


| 


Anthropometric Measurements 57 


and weight. Tall men are better than short men in many phases of 
athletic ability. Being slender is a definite handicap to scoring well 
оша battery of athletic ability tests while carrying additional weight 
is an asset. Short men have an advantage over tall men in only a 
few events, such as the rope climb. In events requiring endurance, 
such as distance running, additional weight or the lack of it are both 
unfavorable to good performance. 

Willoughby 55 presents arguments to show that the relative pro- 
portions of the human figure are of greater physiologic significance 
than weight, a measure only of bulk, and offers a method which will 
yield the optimal measurements of all parts of the body. 

Lookabaugh?? bas developed a strength prediction formula based 
Оп weightings of chest circumference, elbow width, and knee width, 
which correlates .6370 with strength tests. 

Breitinger, 57 a German writer, studied the relationship between 
body form and physical achievement of boys between the ages of 
ten and one-half and twenty. The purpose of the investigation was 
to determine whether or not the physical characteristics of boys 
should be taken into consideration when evaluating their physical 
fitness. In general he concludes that individual physical or anthropo- 
metric measurements have no particular mechanical or physiologic 
Significance in relation to performance. Rather, he indicates, “the 
total performance is determined by the physiologic age and the 
Constitutional type.” 58 ? 

In an attempt to discover certain anthropometric measurements 
Which would be of importance to track coaches in the selection of 
men for training in the high jump, Krakower has found that skeletal 
Measurements “have little influence on the height to which individ- 
vals jump, but in so far as there is a relationship, it is best reflected 

Y a combination of height, length of legs, and breadth of foot." 5? 
Me multiple relationship is approximately 44. m А 

Will n “ ometric Method for rriving at the 

Optimal Воронін 2 ae Eee Adut pour Res. Quart. Am. 
ое, Us HL No OF Total 282» tial Strength of Adult Males 

VR Skeletal Build,” Rer Quart. Am. Assoc. Health, Phys. Educ., and Rec., 

"ҮШ, No. 2 (May, 1957), 51А еве Achievement of Youths” (translated 
rom the Can and condensed by Ernst Thoma), Res. Quart. Am. Phys. 


Educ. Assoc., Vol. VI, No. 2 (May, 1955), 83-91. 


вр. B0: d High Jumping,” Res Quart. Ат 
rakower, Hyman, “Skeletal Symmetry 2 i : S Ca 
Airoc. for He. fone P urs Sal Rec., Vol. 12, No. 2 (May, 1941), 218-227. 


58 The Status of Measurement in Physical Education 


Sheldon: Somatotypes.99 Additional experimental work on 
the relationship of performance to body type has recently been done 
on the basis of Sheldon's classification of physical types. Sheldon 
and his associates have developed a classification method, which 
determines what is called a somatotype, based on the patterning of 
the morphological components of an individual. Three basic com- 
ponents are recognized: (1) endomorphy, characterized by a 


A B с 


Fig. 6. Three types of physique as classified by Sheldon: A, mesomorph; 
В, endomorph; and C, ectomorph. (Williams, Principles of Physical 
Education.) 


dominance of soft roundness, (2) mesomorphy, characterized by a 
dominance of muscle, bone and connective tissue, and (3) ecto- 
morphy, characterized by a dominance of linearity and fragility. 
The somatotype of an individual is determined by analyzing the 
degree of presence of each of the three components. Standardized 
photographic techniques are utilized, and all analy. 
from anthropometric measurements on photographs showing front, 
back and side view. The relative presence of a given component is 
described by a seven point scale; the numeral 1 indicates the mini- 
mum degree of presence of a given component, and 7 the maximum, 
with 4 being the mid-point. Thus, an extreme endomorph would 
have a somatotype rating of 711 (to be read seven, one, one, not 
seven hundred eleven); an extreme mesomorph, 171; and an extreme 
ectomorph, 117. As would be expected, extreme cases are relatively 
rare, while 444, 443, 353, 344, 334, 244, and 235 are among the more 
frequently found types. Seventy-six different somatotypes have 


99Sheldon, W. H., Stevens, S. S. and Tucker, W. B., The Varieties of Human 
Physique. New York, Harper and Brothers, Publishers, 1940. 


ses are made 


Anthropometric Measurements 59 


been isolated, which is an arbitrary number resulting from the use 
of a seven point scale. 

Conclusive studies have not yet been made on women, but 
Sheldon indicates from incompleted studies that while the same 
seventy-six somatotypes occur among women, the distribution 
among the various types differs. °? 

Cureton ®? reports several studies on performance in relation to 
somatotypes, although he utilized a modification of the Sheldon 
technique in determining somatotypes. In studies on Springfield 
College men Cureton concluded that the mesomorphic group scores 
highest on athletic events involving strength and power; the flexi- 
bility of the ectomorphic group serves them best in tests such as the 
Brace Motor Ability Test; and ectomorphs do less well in swimming 
than mesomorphs and meso-endomorphs. 93 

Cureton also reports that American Olympic Swimmers approxi- 
mate the somatotypes of 545, and 454, and that no extreme endo- 
morphs or ectomorphs are found among competitive Olympic 
swimmers. 94 

Seltzer®5 concludes that it is generally futile to attempt to find 
relationships between isolated body measurements and physical 
capacity, but indicates the advantages of total physique descrip- 
tion, such as somatotyping, in this connection. 

Despite negative and conflicting results in a number of studies, 
the research worker in physical education will find anthropometry 
a useful tool. There is room for much further investigation in this 
field, and continuous research is being conducted. 99 
Ibid. p. GG. 
*?Cureton, Thomas K., Jr., Physical Fitness Appraisal and Guidance, pp. 70-135 
as Sf. Louis, The C. V. Mosby Company, 1947. 

ИА” рр. 108-112. 
во 4, рр. 107-108, ; 
eltzer, Carl C., “Anthropometric 


Quart. Am. Assoc. for Health, Phys. Educ., 
201946), 10-20. Es P 
*Fiske, Donald W., “А Study of Relationships to Somatotype, Journal of 
Applied Psychology, Vol. 28, No. 5 (December, 1944), 504—519. к ү 
Sanford, R. N., Adkins, M. M., Miller, R. B. and Cobb, E. A., “Physique 
ersonality, and Scholarship," Society for Research in Child Development 


Nati il, Washington, C., 1945. 

Sheldon, Ge re 5. D., The Varieties of Temperament, New York 
Harper and Brothers, Publishers, 1944. \ 
Zeldler, cuba Res of the Study of Body Types for Physical 
Education," Jr. caf Health and Phys. Educ., Vol. 19, No. 4 (April, 1948), 
241-249, 7 — 


Characteristics and Physical Fitness,” Res 
and Rec., Vol. 17, No. 1 (March 


60 The Status of Measurement in Physical Education | 
Selected References 


BOARDMAN, ROBERT: "World's Champions Run to Types,” Jr. Health and Phys. 
Educ., Vol. IV, No. 5 (May, 1933), 32-33, 62. $ 
Here are presented some very interesting viewpoints regarding the physical 
type necessary for excellence in track and field events. 


BROWNELL, CLIFFORD Lee: Æ Scale for Measuring the Antero-Posterior Posture 0j 
Ninth Grade Boys. New York: Bureau of Publications, Teachers College, 
Columbia University, Contributions to Education No. 525, 1928. Pp. viiand 52. 

Despite a wide variation in expert opinion as to what constitutes good 
posture, the standards of posture developed in this scale have been evolved by 


scientific procedures and may be used, as would a handwriting scale, to obtain 
definite measurement scores. 


Coss, W. MONTAGUE: "Race and Runners," Jr. Heallh and Phys. Educ., Vol. 
VII (January, 1936), 3-7, 52-56. 

During the past few years, the seeming superiority of,the negro over the 
white track athlete has resulted in a great deal of discussion pro and con. Cobb 
points out that there are no negroid anatomic characteristics involved in the 
present dominance of negro athletes in the short dashes and jumps. 


Cozens, FREDERICK W.: Achievement Scales in Physical Education Activities for 
College Men, Chapter I. Philadelphia, Lea and Febiger, 1936. Pp. 118. 

Though age has no relationship to performance ability with college men, the 
factors of height and weight are of importance. A method of grouping college 
men according to these factors has been devised and is described on pp. 8 and 9. 
Real differences between the performance ability of various stature groups show 
the logic of giving consideration to both height and weight. 


CRAIG, MARGARET BELL: “A Comparison of Five Methods Designed to Predict 
the ‘Normal’ Weight of College Women,” Res. Quart. Am. Assoc. for Health, 
Phys. Educ., and Rec., Vol. 15, No. 1 (March, 1944), 64-74. 

Compares the Medico-Actuarial Mortality Investigation age-height-weight 
standards; the revised Pryor Width-Weight Tables, the Boillin weight expect- 
ancy regression equation; the Ludlum method of weight prediction, and the 


McCloy method for appraising physical status, indicating some of the limita- 
tions of such measures. 


Советом, Tuomas K., JR.: Physical Fitness Appraisal and Guidance. St. Louis, 
The C. V. Mosby Company, 1947. Pp. 566. 
Chapter 4 deals with the appraisal of body types 
and fitness guidance; and Chapter 5, with weight 
and fitness guidance. 


Советом, Tuomas K., JR.: “Bodily Posture as an Indicator of Fitness,” Suppl. 
to Res. Quart. Am. Assoc. Jor Health, Phys. Educ., and Rec., Vol. 12, No. 2 
(May, 1941), 349-367. 


Reviews research and problems in post: 
extensive bibliography on the subject. 


Kerry, ELLEN D.: “A Comparative Study of Structure and Function of Normal, 


Pronated and Painful Feet Among Children,” Res. t. dm. Assoc. 
Health, Phys. Educ., and Rec., Vol. 18, No. 4 (Denes sive st UT 


1 ; ‚ 1947), 291-312, 
Gives data resulting from a study of groups having normal, pronated, and 
painful feet. Thirty-five anthropometric and x-ray variables were considered 


which are related to statics, flexibility, strength, morphology, and sensitiveness 
to pain. 


as an approach to health 
analysis as a basis for health 


ire measurement, and includes an 


, Anthropometric Measurements 61 


‚ Massey, WAYNE W.: “A Critical Study of Objective Methods for Measuring 
Anterior Posterior Posture with a Simplified Technique," Res. Quart. Ат. 
Assoc. for Heallh, Phys. Educ., and Rec., Vol. 14, No. 1 (March, 1943), 3-22. 

Presents a fairly complete review of subjective, semi-objective, and objective 
posture measurement techniques, with critical comments. Describes in addi- 
tion a simplified technique based on angle analysis. Includes an extensive 
bibliography. 

МсСгоу, C. H.: “Anthropometry in the Service of the Individual,” Jr. of Health 
and Phys. Educ., Vol. V, No. 7 (September, 1954), 7-11, 46-47. 

McCloy points out that the new anthropometry, "through the use of the 
proper structural and functional measurements offers a more valid method of 
appraising with a fair degree of accuracy the physical status of the individual." 

е discusses and appraises the use of the following anthropometric measures 
— body weight, amount of fat, muscular development, capacity of the lungs, rate 
of metabolism, body type and present health condition. 

PALMER, Carrott E.: “Studies of the Center of Gravity in the Human Body,” 
Child Development, Vol. 15, No. 2 and 3 (June, 1944), 99-165. 

Describes a method for determining the center of gravity in any individual 
by the determination of two planes, a transverse and frontal. Reviews other 
literature on the topic. 

SHELDON, W. H., STEVENS, S. D. and Tucker, W. B.: The Varieties of Human 
Physique. New York, Harper and Brothers, Publishers, 1940. Pp. 547. 
Describes the purposes, procedures and implications of somatotyping. 
SHELDON, W. Н. and Stevens, S. D.: Zhe Varieties of Temperament. New York, 
Harper and Brothers, Publishers, 1944. Pp. 520. ٤ ia _ 

Presents results of a similar study on temperament classification, indicating 

the relationship between that and physique classification. 
4 


CHAPTER IV 


Cardiac Functional Tests 


The field of cardiovascular measurement seems at first glance to be 
far removed from that of strength testing and yet it is the direct 
outcome of findings of the physiologists interested in strength ot 
muscles. Mosso and Martin had shown that muscular efficiency 
was modified by factors of circulation, nutrition and fatigue. If 
this relationship is constant, if good muscular activities can be 
expected when heart, blood vessels and nervous system are function- 
ing properly and the converse explains a poor result, then why not 
measure heart beat and blood pressure (which can easily be done) 
rather then make the more extended calculations on a great number 
of muscles. Inasmuch as these internal factors are capable of 
modifying muscular action they must be, then, more basic and 
fundamental, and logically must come first in any accurate estima- 
tion of physical ability. 

Principles Involved in Cardiovascular Tests. Within certain 
limits there can be expected changes in heart rate and arterial 
pressure upon assuming an erect position. Arterial pressure is 
maintained through the changes in tonus of the blood vessels as 
well as by changes in the heart rate. Several factor analysis studies 
have attempted to describe the principal components of cardio- 
vascular variables.! There appear to be readily defined variables 
of pulse rate and blood pressure, but the degree of modification of 


1McCloy, Charles H., “A Study of Cardiovascular Variables by the Method of 
Factor Analysis, p. 238. Tests and Measurements in Health and Physical 
Education, New York, F. S. Crofts and Company, 1942. 
Larson, Leonard A., “А Factor-Analysis of Some Cardiovascular- 


62 


Respiratory 


Cardiac Functional Teste 63 


these factors by physiological mechanisms has not been clearly 
determined. The nervous system coordinates the activities of the 
heart beat and vasomotor svstem so that under ordinary circum- 
stances blood pressure is kept normal, but in addition the nervous 
system brings about a compensation for any failure on the part of 
one organ, either heart or blood vessels, by increasing the activity 
of the other. 

In a change from the reclining to the erect position, gravity 
immediately produces a condition that must be counteracted by 
either increased heart beat or increased arterial tonus or both. 
The amount of change in arterial pressure produced and the accom- 
panying changes in heart rate can be ascertained for normal indi- 
viduals and any deviations from these norms can be recorded. 
Changes in blood pressure and heart beat in assuming an erect 
Position may also indicate directly a general condition of muscular 
tonus. In case of poor posture and weak muscular tone of abdominal 
muscles, the force of gravity may cause an overextension of blood 
vessels, collecting undue quantities of blood in these large veins, 
establishing what are technically known as "blood lakes." The 
immediate reaction is a lowered blood pressure, and the consequent 
increase in heart beat to force more blood into the arteries in an 
attempt to raise the pressure. In case, however, that the muscle 
tonus is high, then upon standing the abdominal muscles will con- 
fract, pressure will be exerted on the abdominal veins, “blood 
lakes” will not form and the heart rate will not need to be increased 
materially to maintain a normal pressure. 

This entire mechanism, including nervous system, heart, vaso- 
Motor apparatus and skeletal muscles, works to maintain proper 
Physiologic conditions. If one factor is weak, the others work the 
harder to overcome that defect. Just as long as these conditions 
can be kept within normal limits, we say there is "compensation" 
whether there are defects or not, but as soon as this mechanism 

reaks down, we say there is “lack of compensation" and some 
more or less serious condition is indicated. Conditions which upset 
Normal functioning of the body are reflected by this compensating 


Variables and Tests,” Res. Quart. dm. Assoc. for Health, Phys. Educ., and Rec., 
Vol. 18, No. 2 (May, 1947), 109-122. | 

Мигрһу, Mary Agnes, “А Study of the Primary Components of Cardiovascular 
Tests," Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. XI, No. 1 


(March, 1940), 57-71. 


64 The Status of Measurement in Physical Education 


mechanism. Digestive upsets and the bacterial poisons absorbed 
in the blood effecting the nervous tissues, oversecretion or unusual 
action of the internal secreting glands, fatigue as result of hard 
muscular labor or from extreme mental exertion, all these are types 
of conditions which may unbalance this delicate regulating device. 

All of the tests in this section are attempts to estimate degrees of 
compensation and to evaluate physical condition on the basis of 
this cardiac-vasomotor mechanism. 

Crampton’s “Blood Ptosis" Test.? Crampton has made use 
of these physiologic phenomena and has set up a rating scheme in 
which he attempts to get at least an approximate idea of the general 
condition of an individual by comparing the values of the reclining 
heart rate and blood pressure with corresponding values in the 
erect position. A great deal of experimentation on New York City 
High School pupils led him to the conclusion that the change from 
the reclining to the erect position increases the heart rate from 
0 to 44 beats per minute and causes variations in the systolic blood 
pressure ranging from — 10 mm. to +10 mm. Hg. He assigns equal 
values to these variations and has constructed the following table 
by which he expresses the vascular tone of the individual in 
percentages: 

TABLE II 
Crampron’s RATING SCHEME For GENERAL CONDITION 


Systolic Blood Pressure 
Heart rate increase 


Increase Decrease 


+8 |+6 |+4 |+2 | 0 |—2|—4|—6 |-8 |—10 


95 | 90 | 85 | 80 | 75 | 70 | 65 | 60 | 55 | 50 
90 | 85 | 80 | 75 | 70 | 65 | 60 | 55 | 50 | 45 


2Crampton, C. Ward, “A Test of ition,” 
ERE EA аг est of Condition," Jed. News, LXXXVII (Septem- 


Start Today, pp. 130-135. New York, Association Press, 1941. 


Cardiac Functional Tests 65 


For increase in pressure higher than +10, add 5 per cent to the 
+10 column for each 2 mm. in excess of 10. For decrease in pressure 
lower than —10, subtract 5 per cent from the —10 column for each 
2mm. lost. 

Thus a person showing an increase in heart rate of five to eight 
beats per minute upon assuming the erect position and an increase 
in his systolic blood pressure of 10 mm. Hg is said to be in excellent 
general condition, while one whose increase in heart rate is 15 and 
whose increase in systolic pressure is only eight is not in such good 
condition. Grades of fitness are assigned as follows: 

Grade A Score of 90 per cent or over; 


Grade B Score between 80 per cent and 90 per cent; 
Grade C Score between 70 per cent and 80 per cent. 


McCurdy’s Condition Test.? McCurdy's work with adolescent 
boys led him to the conclusion that, if the change in heart rate from 
the reclining to the erect position exceeds 20 beats per minute, the 
individual should be advised to consult a physician. With moderate 
exercise, each knee raised alternately to right angles ten times in 
twenty seconds, the heart rate should return to pre-exercise rate in 
two minutes. For athletes, after an exercise consisting of running 
in place twenty times in five seconds, the heart rate should resume 
normalcy within two minutes. 

Meylan's Test.4 This test was devised in connection with the 
physical examination for men given at Columbia University and is 
only one part of the general examination which included: 

1. General condition—(a) nutrition, (6) color of skin, (c) general 
appearance. 

2. Pulse rate—rhythm and character in (a) horizontal, and 
(6) vertical position. 

3. Blood pressure—(a) horizontal, and (6) vertical position. 

4. Test of the heart’s reaction to exercise—(a) the subject hops 
à distance of 100 feet; (6) the heart rate is counted at the apex 

uring four consecutive fiftten second periods immediately after 
Opping; (c) the percentage of increase from the normal during the 

* McCurdy, J. H. and Larson, Leonard A., Physiology of Exercise, (Third Edition), 

мер. 267—269. Philadelphia, Lea and Febiger, 1959. t 

cCurdy, J. H., “Adolescent Changes in Heart Rate and Blood Pressure, 

„im. Phys. Educ. Rev., XV (June, 1910), 421. А 
Meylan, С. L., “Twenty Years Progress in Tests of Efficiency, 
Rev., XVIII (October, 1913), 442. 


” Am. Phys. Educ. 


66 The Status of Measurement in Physical Education 


first fifteen second period is noted; (d) the percentage of recovery 
from the first fifteen second period to the fourth is noted. 

Favorable signs in connection with (c): less than 100 per cent 
increase. Untavorable: more than 100 per cent increase. 

Favorable signs in connection with (4): more than 80 per cent 
recovery. Unfavorable: less than 80 per cent recovery. 

Foster's Test. The object of the Foster test is to determine 
how the heart behaves after a mild form of exercise. The test is 
built upon the fact that exercise increases the frequency of the heart 
in almost direct proportion to the intensity of the exercise. Hence, 
if the hearts frequency is greatly increased by a mild form of exer- 
cise, the individual being examined is not in the proper physical 
condition. By standardizing the exercise and determining the 


reaction of persons in good physical condition, Foster was able to 
set up norms of condition. 


The method of making the test is as follows: 


l. The pulse rate of the subject is taken standing for thirty 
seconds, or longer if there seems to be much nervousness. The rate 
per minute is recorded as A in Table III. 


TABLE III 


Fosrer’s RATING SCHEME ror PHYSICAL EFFICIENCY 


A 


Pulse rate before lest 


Difference in pulse 

rate before and im- 

mediately afler test 
(B minus A) 


Difference in pulse 
rate before and after 
45 second rest- 
standing 
(C minus A) 


Points | Difference | Points | Difference | Points 
100 or less 0 
101 to 105 =] 
106 to 110 —2 0 to 20 15 
111 to 115 m 21 to 30 13 5 —1 
116 to 120 —4 31 to 40 11 6 to 10 —2 
121 to 125 —5 41 to 50 9 11 to 15 -3 
126 to 150 = 51 to 60 7 16 to 20 —4 
131 to 155 -7 61 to 70 5 21 to 25 -5 


5Foster, W. L., “A Test of Physical Efficiency,” Am. Phys. Educ. Rev., XIX 
(December, 1914), 632. See also: Williams, J. F., The Organization and Ad- 
minisiration of Physical Education, p. 294. New York, The 


Macmillan 
Company, 1925. 


Cardiac Functional Tests 67 


2. The subject then runs in place for exactly thirty seconds by a 
stop watch at the rate of 180 steps per minute. While standing at 
ease, the subject’s pulse rate is taken immediately after cessation of 
the run, for five seconds (or for fifteen seconds until the observer 
becomes proficient in taking pulse rate and using a stop watch). 
The rate per minute is recorded as B. 

3. After subject has stood at ease for forty-five seconds, the pulse 
rate is again taken and the rate per minute is recorded as C. 

4. The values from the table are then added and the total, 
observing minus signs, represents the final mark. Fifteen is the 
maximum number of points possible to obtain. 

Example: 

A pupil has a pulse rate standing at ease of 82 which gives a score 
of 0 points in A. His pulse rate immediately after completing the 
stationary run is 106 which is 24 beats higher than before the run 
and hence he will get a rating of 13 points in the ^ B minus A" table. 
After forty-five seconds his pulse rate has come down to 89 which is 
still seven beats more than his original “standing at ease" pulse and 
gives him in the “С minus A" table (—2) points. His efficiency 
rating then is 0 + 13 + (—2) which is 11 out of a possible 15. 

The Barach Test. “The energy index"; this test is designed 
to yield an index of efficiency based upon several determinations of 
the systolic and diastolic blood pressures and the pulse rate executed 
during a period of one minute. Both pressures are multiplied by 
the heart rate and the products added yielding an arbitrary value 
in which the last two digits are cut off to reduce the index number. 


Example: 
125 (systolic pressure) X 72 (heart rate) = 9,000 
85 (diastolic pressure) X 72 (heart rate) = 6,120 


Index — 15,120 or 151 


А large number of measures indicate that a robust person will 
Show an index varying from 110 to 160. The upper normal limit is 
Considered as 200 and the lower limit as 90. That is, any person 
Showing over 200 presents a condition known as hypertension while 
Опе showing under 90, a condition of hypotension. In presenting 
this test, Barach says that we should consider all three elements “in 


*Barach E “ Index." Jr. Am. Med. Assoc., LXII (February 14, 
, J. H., “The Energy 
1914), 525. 


68 The Status of Measurement in Physical Education 


order to get an idea of the cardiovascular energy expended and of 
the strain under which the heart and the blood vessels are laboring.” 

The Barringer Test.’ Working along the same lines as the 
German physiologists Graupner, Katsenstein, and Krogh and 
Lindhard (cited by Burton-Opitz),? Barringer? discovered that 
those individuals who were physically deficient showed a delayed 
rise in blood pressure after the completion of an exercise, that is, 
the pressure does not gradually return to normal but may rise still 
higher than the level reached immediately after the exercise. The 
principle involved in the Barringer test is to subject the individual 
to a regularly increasing amount of work measured in foot pounds 
in order to ascertain whether he can be made to show a “delayed 
rise" in blood pressure. The character of work to which the indi- 
vidual is subjected consists in raising a dumb-bell from the floor to 
full extension of the arms over the head. The subject stands with 
legs.apart, grasps the bell (placed between his feet) with both hands 
raises it to the level of his shoulders and then pushes it above his 
head to full extension of the arms. It is then quickly lowered to the 
floor in a single motion. The swing is repeated to obtain the required 
number of foot pounds of work. Since the weight of the bell may be 
varied, the total amount of work done at each swing can be made 
to yield a rather exact figure. The work, however, must be completed 
within a specific length of time, otherwise the rest periods will tend 
to diminish its effect. One swing per second is the required speed. 

'To calculate the number of swings of the bell for a required 
number of foot pounds, proceed as follows: 

(1) The subject should stand with legs apart, as in the test, facing 
the wall. Obtain his reach by measuring the distance between his 
clenched hands, at full extension of the arms overhead, and the floor. 
Add to this figure one and one half and multiply the total by the 
weight of the bell. To this add one half the body weight to take 
care of the erection of the body. 

(2) Divide the number of foot pounds of work by the figure 
obtained in (1) and the result will be the number of swings required. 


7Barringer, T. B., Jr., “The Circulatory Reaction to Graduated Work asa Test 


of the Heart's Functional Capacity,” Archives of Internal Medicine, XVII 
(March, 1916), 365. 


8Burton-Optiz, R., op. cit. 
Barringer, T. B., Jr., “Studies in the Heart’s Functional Capacity as Estimated 


by the Circulatory Reaction to Graduated Work,” Archives of Internal Medi- 
cine, XVII (May, 1916), 670. 


ә 


Cardiac Functional Tests 69 


Example: 
Reaches g else Secus ue ERUIT yT 7 ft 
ro DEMNM nr КОЕ лс oe 1i 
Tafal cen sa ses жасай d: IS a ГҮ BER 8} 
Multiply bysweightofibell ain a daroe vs ro ameme ts aerden a 10 lbs 


Add one half bodyweight, s c eaman sie o E i sees 


Divide 2900 foot lbs. work required by 
Number of swings тедшїгей..........................- 


The heart rate and systolic blood pressure are simultaneously 
determined before the exercise and these determinations are repeated 
several times to make sure that normal conditions are present. The 
definite amount of work is performed and readings again taken every 
thirty seconds for three to five minutes. The amount of work is 
increased daily until the point is reached at which the individual 
shows the “delayed rise.” 


Experience has shown that perfectly normal persons of about 
twenty years of age are able to endure 2500 foot pounds of work, 
executed within a period of thirty seconds, with a rise in systolic 
pressure of about 20 mm. Hg and an increase in heart rate of about 

beats. Thus, a person showing a normal arterial pressure of 120 
mm. Hg and a pulse rate of 80 per minute, would reveal directly 
after the exercise a pressure of about 140 mm. Hg and a cardiac 
frequency of about 100. Then follows a steady decline to normal, 

1e total time required for this change being about two minutes. 
The following day the person may do 5000 foot pounds of work, and 
50 on until 4000 or even 5000 foot pounds have been reached. How- 
ever, great care should be exercised to adapt the amount of work to 

€ person's general condition, so as not to overtax his vascular 
System,10 


There is no scale available at the present time for reading the 
results since the normal cardiac capacity varies rather widely in 
different people. However, Barringer has provided here a test which 
estimates the functional capacity of the heart. 

The Schneider Test.!! Tests made up to this time had 
attempted to evaluate the rise in heart rate and pressure on standing 
1i Burton-Opitz, R., op. cit. . 


chneider, E. C., “А Cardiovascular Rating as a Measure of Physical Fatigue 
and Efficiency," Jr. Ат. Med. Assoc., LXXIV (May 29. 1920), 1507. 


70 The Status of Measurement in Physical Education 


TABLE IV 
TABLES FOR SCORING SCHNEIDER’S CARDIOVASCULAR TEST 


Pant. Part B 
Reclining pulse rate Pulse rate increase on standing 
0-10 11-18 19-26 | 27-34 | 35-42 
Beats | Beats | Beats | Beats | Beats 
Points | Points | Points | Points | Points 
50- 60 5 3 3 2 1 0 
6l- 70 3 3 2 1 0 =l 
71- 80 2 5 2 0 zx =2 
81- 90 1 2 1 -l =2 -3 
91-100 0 1 0 =? ~3 -3 
101-110 -1 0 —1 —$ mi —S$ 
Part C Part D 
Standing pulse rate Pulse rale increase immediately after exercise 
0-10 11-20 | 21-30 | 51-40 | 41-50 
Rate Points Beats | Beats | Beats | Beats | Beats 
Points | Points | Points | Points | Points 
60- 70 3 5 5 2 1 0 
71- 80 3 3 2 1 0 0 
81- 90 2 3 2 1 0 -l 
91-100 1 2 i 0 -1 -2 
101-110 1 1 0 -l1 —2 —5 
111-120 0 1 -l —2 -3 —5 
121-150 0 0 —2 =3 —3 — 
151-140 -1 0 =3 —3 m] -$ 
Part E Е) Part F 
Return of pulse rate to standing Systolic pressure standing compared 
normal after exercise with reclining 
Seconds Points Change in mm. Points 
= D 
2 Rise of 8 ог more... 5 
1 Rise of 2-7..... 2 
0 No rise...... T 
Balliol 258. Lacs qan deu 0 
ТИЙИН]... ex ca эзы да къы» =1 
After 120: 11-30 beats above Fall of 6 or more. .... zs]. 
Ova Lus ass uas: оаа to шей —2 ei 


Cardiac Functional Tests 71 


(Crampton) and another had added the elements of effect of exercise 
(Foster) but all of these lacked completeness. In a study calculated 
to give a more accurate picture of the cardiovascular function, 
Schneider has presented an ingenious test. 

In this test two or three determinations of the heart rate are made 
while the individual assumes the reclining position. The readings 
should cover a period of at least five minutes. Part A of the tables 
gives the score to the average of these readings. An average of two 
or three readings of the systolic blood pressure is taken while the 
subject is in the reclining position. 

After assuming the erect position, there should be a delay of two 
or three minutes before the pulse rate is recorded. Part B of the 
table will give the score for the difference between the reclining rate 
and the standing rate; Part C will give the score for the standing 
rate. Next the systolic pressure on standing is ascertained and the 
score for the difference between the reclining and standing pressure 
noted in Part F. 

Next a definite muscular exercise is performed by the subject. 
He places his right foot upon a chair (18 inches high) and at the 
command “go” raises himself, slowly and without touching any- 
thing, to the erect position so that his left foot comes to rest beside 
his right. Keeping his right foot on the chair, he returns to the 
original position. The rate of the exercise is once every three seconds 
and it is repeated so that five completions are made in fifteen seconds. 

Upon completion of the exercise the pulse rate is ascertained. It 
is counted also at sixty, ninety and one hundred and twenty seconds 
after exercise. The pulse should be counted for fifteen seconds and 
that count multiplied by four. Parts D and E of Table IV rate these 
counts. 

It will be noted that the scores for each of the six items range 
between +3 and —3. A perfect record will give a score of 18, while 
defici i or less. 

еа аи Мало Test.!? Thisis also called the 40 тт. 
lest because the chief feature 1s holding a column of mercury 40 mm. 
high in a U-tube by the force of air breathed into one side of the tube. 
The classification of such a test among the cardiovascular ratings 
rests upon some experimental studies of Warner and Hambly, which 
Show the change in blood pressure and pulse rate, and can be ex- 
1?Warner, E. C. and Hambly, W. D., “An Investigation into the Physiologic Basis 


of the U-tube Manometer Test,” Guy's Hospital Reports, Vol. 75 (1925), p- 286. 


72 The Status of Measurement in Physical Education 


plained on the basis of the adequacy of the thoracic and abdominal 
blood vessesl to maintain their tone. Great variations in pulse rate 
and blood pressure before, during and after the test indicate lack ot 
physical fitness. 

The test is conducted as follows: the subject makes a deep 
expiration followed by the deepest possible inspiration, and then 
quickly introduces the mouthpiece between the lips and the teeth, 
which clasp the mouthpiece tightly, and blows through it until he 
is blowing against a pressure of 40 mm. of mercury. The mercury 
is raised to the required level as quickly as possible, for it is found 
to be much more strenuous to raise mercury slowly. While the 
mercury is sustained at the level of 40 mm. the subject is warned 
not to allow ballooning of his cheeks, and of course does not introduce 
his tongue into the mouthpiece. A few subjects tend to lose air 
through the nose, and these are fitted with a nose clip, but many 
subjects prefer not to use the clip, because the discomfort caused by 
it lowers the time during which the mercury can be sustained. 

The mouthpiece described above is of rather a large size, causing 
early tiring of the orbicularis oris muscle (and in later experiments 
it was replaced by a small cigaret holder). It was found that this 
could be more readily gripped with less discomfort. | 

The test may be applied in one of two ways, either (1) the maxi- 
mum time during which the subject can sustain the mercury is tested 
‘by a stop watch, or (2), the mercury is held at the level for a period 
(e. g., twenty seconds) and the pulse rate and blood pressure during 
and after the test are compared with those taken before the test. 

No standards of measurement can be set but great variation in 
pulse rate and blood pressure before, during and after the test can 
be taken as an indication of the degree of unfitness. 

Hambly, Pembrey and Warner consider the U-tube manometer 
test as a measure of the sensitiveness to discomfort rather than a test 
of fitness and have shown that athletes have not been able to hold 
their breath longer than nonathletes. On the other hand the whole 
mechanism of the good athlete is build up around a defense against 
just such conditions as this test imposes. The response to slight 
changes is of distinct advantage to a highly trained performer. 

This test is not to be condemned altogether for, as these investi- 
gators point out, the poorest nonathletes and the unfit show a wide 
range in the variations of the pulse rate and blood pressure which 
indicates lack of tone in abdominal and thoracic blood vessels. 


Cardiac Functional Tests 73 


The Pulse-Ratio Test. The original idea for this test came from 
the experimental work of a number of English investigators working 
at Guy's Hospital in London.!? The procedure of these earlier 
investigators has been modified and simplified by Tuttle at Towa 
and a considerable amount of experimental work using the new 
technique is now available. Tuttle!* first showed conclusively that 
the increase in heart rate after exercise is directly proportional to 
the intensity of the exercise and that the relationship between the 
two is rectilinear. His next problem was the development of some 
practical test for rating physical efficiency which would be of value 
for the athletic coach and physical education teacher. Pulse ratio 
as used by Tuttle means the ratio of the resting pulse rate to the 
rate after exercise and is found by dividing the total number of pulse 
beats for two minutes after a given amount of exercise by the normal 
resting pulse for one minute. If, for example, the resting pulse for 
one minute is 72 and the total pulse for two minutes after a given 
amount of exercise is 180, the pulse ratio may be represented by the 
division (180/72) or 2.5. 

Tuttle’s!5 technique in rating physical efficiency may be very 
briefly explained as follows: 

The exercise consists of mounting a stool 13 inches high by means 
of a definite technique at a definite rate per minute. Two rates ot 
exercise are selected, one to produce a pulse ratio less than 2.5 and 
the other (much faster) to produce a pulse ratio greater than 2.5. 
These two ratios are plotted on a graph and the points joined by a 
Straight line. One of the ordinates is steps per minute and the other 
the pulse ratio. The number of steps per minute to yield a pulse 
ratio of 2.5 can then be determined. Great care must be used in 


"Hunt, G. Н. and Pembrey, M. S., “Tests of Physical Efficiency,” Guy's Hospital 


Reports, Vol. 71, (1921), 415-428. 
Hambly, W. D., Hunt, G. H., Parker 
aa “Tests for Physical Efficiency, 
—985. 4 ' 
Hambly, W. D., Pembrey, M. S. and Warner, E. C., "The Physical Fitness of 
ed UM БУ Various Methods,” Guy's Hospital Reports, Vol. 75 (1925), 
88-394. 
Campbell, J. M. H., "Weight, Vital Capacity, Pulse Rate Before and After 
хегсіѕе a Physical Fitness in Health,” Guys Hospital Reports, Vol. 75 


(1925), 263. 
Tut? 9 W. and Wells, George, “The Response of the Normal Heart to 
Exercises of Graded Intensity,” Arbeitsphysiologte, Berlin (1931), 519-526. { 
Тоне, W. W., “The Use of the Pulse-Ratio Test for Rating Physical Effici- 
lency, Ren. Quark. dim. Phys. Educ. Амос» Vol. IL, No. 2 (May, 1981), & 17. 


er, L. E. L., Pembrey, M. S. and Warner, 
' Guy's Hospital Reports, Vol. 72 (1922), 


74 The Status of Measurement in Physical Education 


determining the pulse rate, in allowing sufficient time between the 
first and second exercise, and in testing after strenuous exercise. In 
determining the efficiency rating of any individual, 50 steps for one 
minute is considered to represent the amount of exercise required 
to produce a pulse ratio of 2.5 in a highly efficient individual. 
The efficiency rating when put on a percentage basis!® will be 


100 (No. steps required for 2.5 pulse ratio), 
50 


А number of studies have been conducted using the technique 
developed by Tuttle. A partial list of the conclusions are given here: 


1. There are no significant differences in efficiency rating when a 
comparison is made on the basis of age. 17 

2. The data show a high degree of relationship between the 
recovery ability of the heart and the efficiency rating.!8 

3. The efficiency rating of athletes during the season of competi- 
tion is materially increased . . . . The most common occurrence 
is a fall in the rating after athletic competition. ! 9 

4. Athletic women show a slower normal heart rate and have a 
higher physical efficiency than the nonathletic type. 2° 

5. The variation in physical efficiency ratings which one may 


expect in the normal individual from time to time will fall between 
9 and 15 per cent.?! 


6. 'The pulse-ratio test is a reliable indicator of proficiency, or 
inefficiency as the case may be, in performing on gymnastic appara- 
tus.?? Specific movements on the apparatus in question are substi- 


16Tuttle, W. W., op. cit. p. Tl. 

17 Tuttle, W. W. and Skien, J. S., “The Efi 
Shown by the Pulse-Ratio Test,’ 
No. 3 (October, 1930), 27. 

18 [hid. 

19/64. 


20Tuttle, W. W. and Frey, Henryetta, “A Study of the Physical Efficiency of 
College Women as Shown by the Pulse-Ratio Test,” Res. Quart. dm. Phys. 
Educ. Assoc., Vol. I, No. 4 (December, 1950), 25. 

? Tuttle, W. W. and Frey, Henryetta, op. cil. 

22Tuttle, W. W. and Wilkins, R. C., “The Application of the Pulse-Ratio Test to 
Efficiency in Performing on Gymnasium Apparatus, The Horizontal Bar," 
Arbeitsphysiologie, Berlin (1930), 449-455. 

Schroeder, E. G. and Tuttle, W. W., “The Application of the Pulse-Ratio Test 

to Efficiency in Performing on Gymnasium Apparatus, The Parallel Bars," 

Arbeitsphysiologie, Berlin (1931), 443-452. 


] ciency Rating of High School Boys as 
Res. Quart. Am. Phys. Educ. Assoc., Vol. I, 


Cardiac Functional Tests 75 


tuted for stool mounting. However, instead of varying the rate, 
the number of trials is varied. 

7. Menstruation does not bring about a cyclic rise and fall in 
physical efficiency. 3 

8. The pulse-ratio technique is reliable in detecting noncompen- 
sated organic heart lesions. ?* 

9. The pulse-ratio test appears to be a reliable criterion for 
endurance since “there is a high correlation between the physical 
efficiency rating as measured by the pulse-ratio test and endurance 
in sprint running." 25 

10. In a further study on this point, Henry and Kleeberger 
found lower correlations and concluded that: “General muscular 
endurance is not a determining factor in the pulse-ratio test. Initial 
strength may be a small positive factor." 26 

ll. In subsequent experimental work Tuttle and Diclinson?? 
found a .93 correlation between his 2 point pulse ratio and a simpli- 
fied pulse ratio. The latter was based upon sitting pulse counted for 
two minutes after thirty steps per minute on a 13-inch stool, divided 
by the sitting pulse rate before exercise. The correlation was .957 
when the exercise was increased to forty steps per minute. 

12. Tuttle and Charlesworth? concluded that rope jumping was 

'as adequate an exercise as chair stepping in tests of this type. 
13. Morehouse and Tuttle?? show the need for the exercise to 


23Scott, Gladys and Tuttle, W. We “The Periodic Fluctuation in Physical 
Efficiency Daring the Menstrual Cycle,” Res. Quart. Am. Phys. Educ. Assoc., 
Vol. III, No. 1 (March, 1952), 144. 

24Sievers, Henry, “A Simple Method 
the Pulse-Ratio Test,” Res. Quart. Ат. Phys. Educ. 4 

„ (May, 1955), 36. 

?5Flanagan, Kenneth, “The Pulse- 
in Sprint Running," Supp. fo Res. 


(October, 1935), 50. " fe а 
26Henry, F. M. Шу Kleeberger, F. L., The Validity of the Pulse-Ratio Test of 


Cardiac Efficiency,” Res. Quart. ‘Am. Assoc. for Health, Phys. Educ., and Rec., > 
Vol. 1 March, 1958), 52-46. ESI | 
RETE No ыы R. E. “A Simplification of the Pulse-Ratio 
ITéchniuedor Rating Physical Efficiency and Present. Condition," Res. Quart. 

Am. Assoc. for Health, Phys. Educ., and Ree., Vol. IX, No. 2 (May, 1958), 75-81. 
28Tuttle, W. W. and Charlesworth, John E., “A Study of the Standardization of 


Exercise for Use in the Pulse-Ratio Test," Res. Quart. Am. Assoc. for Health, 
Phys. Educ., and Rec., Vol. X, No. 1 (March, 1939), 150-155. ив: 
29 Morehouse, TÉ E. and Tuttle, W. W., “A Study of the Post-Exercise ear! 
Rate," Res: Quart: Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 13, No. 1 


(March, 1942), 3-9. 


of Detecting Abnormal Hearts by the Use of 
Assoc., Vol. VI, No. 2 


Ratio Test as a Measure of Athletic Endurance 
Quart. Am. Phys. Educ. Assoc., Vol. VI, No. 5 


76 The Status of Measurement in Physical Education 


be strenuous if it is to be used to differentiate among individuals 
upon the basis of post-exercise pulse rate or recovery rate. . 

14. Phillips et. al.3° concluded that Tuttle's Pulse Ratio Test 
had too low a reliability coefficient to justify its validation for use 
with college women. 

The Harvard Step Test.! Brouha and colleagues developed 
a test in the Harvard Fatigue Laboratories during World War II to 
measure “the general capacity of the body to adapt itself to hard 
work and to recover from what it has done." The test differs from 


previously described pulse-ratio tests in that pre-exercise pulse rate 
is not considered. 


The Harvard Step Test consists in having the subject step up and 
down a 20-inch platform thirty times a minute for five minutes, 
unless he stops from exhaustion before then. The pulse is counted 
from 1 to 124, 2 to 214, and 3 to 574 minutes after the work stops. 
The score is obtained by dividing the duration of the exercise by tis 
sum of pulses in recovery according to the formula: 


Duration of exercise in seconds x 100 
2Xsum of pulse counts in recovery 


The meaning of the figures thus obtained is as follows: below 55 — 
poor physical condition; from 55 to 64 — low average; from 65 to 
79 — high average; from 80 to 89 — good; above 90 — excellent, 


Index 


This test is considered to be useful in separating the least fit from 
the fit, and the fit from the very fit to provide each group in turn 
with conditioning programs to meet its needs. 3? 
vard undergraduates it was found that the tes 
scores which varied least on athletes in trainin 
improved under training and decreased after te 
ing. 33 

The test was ori 
index based upon 


In studies on Har- 
t produced higher 
g, and that scores 
rmination of train- 


ginally validated against the criterion of a work 
the three factors of endurance treadmill running, 


30Phillips, Marjorie, Ridder, Eloise and Yaekel, Helen, “Further Data on the 
Pulse-Ratio Test,” Res. Quart. Ат. Assocs for Health, Phys. Educ., and Rec., 
Vol. 14, No. 4 (December, 1943), 425-429. 

31Brouha, Lucien, “The Step Test: A Simple Method of Mea 
Fitness for Muscular Work in Young Men,” Res. 
Phys. Educ., and Rec., Vol. 14, No. 1 (March, 19. 

32Gallagher, J. R. and Brouha, Lucien, 
Fitness of Boys," ibid., 23-30. 


33Brouha, Lucien, Fradd, Norman and Savage, Beatrice M., “Studies in Physical 
Efficiency of College Students,’ db PO LU 


^ Res. Quart. Am. Assoc. for Health » Phys. Educ., 
and Rec., Vol. 15, No. 3 (October, 1944), 211-224, Р 


suring Physical 
Quart. Am. Assoc. Sor Health, 
43), 31-35. 

“A Simple Method of Testing the Physical 


і Cardiac Functional Tests 77 


| maximum heart rate per minute and blood lactate level. This test 
| has been widely used in recent years both in physical fitness testing 
programs and in research studies. 
| Brouha and Gallagher?* describe a modification of the step test 
to appraise the functional fitness of high school girls. This test 
| provides a 16-inch high platform, and the duration of the test is 
1 limited to four minutes. 
Clarke35 utilizes the principles of the Brouha Step Test, and 
i; recommends for college women that the bench be 18 inches high, a 
step be completed every two seconds for a maximum of four minutes, 
and the pulse be counted at one to one and a half, two to two and 
one-half; three to three and one-half minutes after cessation of the 
exercise. 
Pack Test of Exercise Tolerance.?9 This maximal pack test 
was developed during World War II for the purpose of quickly 
for fitness and endurance in heavy 
physical exertion. The test offers several variations from other step 
tests described thus far. While an 18-inch bench is used, the subject 
grasps a crossbar above the bench with the left hand, steps on the 
stool with the left foot, and on the signal to start comes to a full 
vertical stand on the bench. The right foot is also used for the 
descent, and every half minute the subject changes to the opposite 
leg and continues the exercise without breaking the rhythm of forty 
ascent-descent cycles per minute. In addition the subject carries a 
packsack in which a 10-pound weight is placed to start and addi- 
tional 10-pound weights added each two minutes thereafter. The 
subject continues the exercise until he can no longer execute it at the 
required rate. The total time the exercise is continued is taken, and 
the heart rate is counted from ten to thirty seconds after the end of 
the work. This exercise brings even exceptionally well-conditioned 
men to exhaustion within ten minutes. The final score is the total 
time the exercise is continued. The heart rate is used as a check to 


ascertain if the subject has put forth adequate effort. 


Roswell, 
d Phys. Educ., Vol. 


testing large groups of men 


34Brouha, Lucien and Gallagher, “А Functional Fitness Test for High 
School “Girls,” Jr. of Health an 14, No. 10 (December, 
^ одр Harriet Deep Functional Physical Fitness Test for College Women,” 
| Jr. Health and Phys. Educ., Vol. 14, No. T Ce 1945), pues ide 
36Taylor, Craig A Maximal Pack Test of E CP les. Qus 
Assoc. for Health, Phys- Educ., and Rec., Vol. 15, No. ecember, ; 
291-302. 


78 The Status of Measurement in Physical Education 


This test has the advantage of distinguishing between individu 
in the upper ranges of scores, while other step tests are submaximal 
for the best conditioned men. Since height is a slight advantage in 
this test height corrections accompany the scoring table. 

McCloy’s Test of Present Condition, 37 In an attempt to find 
a more effective test of present condition, McCloy studied the 
relationship of fifteen variables with a criterion using а good condition 
group and a poor condition group. The technique of bi-serial dd 
was used throughout. The three variables which have the most 
significance in a formula for estimatin 


als 


of this study it was necessary to prove that blood 


the method of examination, 


group) and (2) swimmers of 
dition group), and correlation coefficients 


37McCloy, C. H., “A Cardio-V. 


siologie, Berlin, Vol. IV, No. 2 (November, 1930). Tests and Measurements in 
Health and Physical Education, рр. 248-249, New York, 
1942, 


» "Measurements of Organic Efficienc for 
f Physical Condition,’ 5, is 


* Supp. to Res. t. Am. - Educ. 
oe, Vol VI, No. 2 (May, 1935), 11 41. “pp. to Res. Quart. Am Phys. Educ. 


Cardiac Functional Teste 79 


were computed by means of the bi-serial “r” method.*? Combining 
the five items into a multiple relationship for predicting organic 
efficiency, the resulting correlation coefficient is extremely high 
(R — .947). The five items selected and weighted according to their 
contribution are: (1) sitting diastolic pressure, (2) breath holding 
twenty seconds after standard stair climbing exercise, (3) difference 
between standing normal pulse rate and pulse rate two minutes 
alter exercise, (4) standing pulse pressure, and (5) vital capacity. 
To obtain an index score, raw scores on the five items are changed 
to T-scores and these, when multiplied by weights, are added. 
Standards of organic condition have been set up which enable the 


examiner to classily subjects tested. 
In the original study the biserial ы 
апа {һе groups used as the criterion was 
however, revealed the correlations obtained by this method to be 
spuriously high, since the normal group was omitted from the 
Sampling and correlations were based on the two extremes of a 
theoretical normal curve. Also, fever patients produce a distorted 
weighting for sitting diastolic pressure, which is not typical of or 
applicable to normal subjects. In a later study, by McCurdy and 
Larson? in which infirmary patients, varsity swimmers and Olympic 
swimmers were used the biserial “r” was found to be .70. A validity 
Coefficient of .68 was found for the Index in predicting score in 
seconds for the 440-yard swim for selected college students. In this 
study a different weighting was given to sitting diastolic pressure in 


the formula. З M 
Olds43 concluded that the McCurdy-Larson Organic Efficiency 
Lest does not measure differences in condition developed by five 
month's participation in basketball, and that its reliability is 
Secured only under carefully controlled circumstances. 
Tests of Circulatory Fitness. The circulatory response fo 


^^ Kelley, Truman L., Statistical Method, pP- 
4 Company, 1924. ques 
à Stan Thomas «Ks Jr Physical oe 
women Rx { eran PAP тре Validity of Circulatory-Respiratory 
Slice ia Todes of Endurance Condition in Parque is am: 
Ts E Ж 7 ., Vol. XI, o. 3 (October, , 5-11. 
Ode a «айй, Plae Elect 4 Competitive Basketball Upon the Physcial 
Fitness: ҮН: hı School Boys as Determined by the Мега төш s 
ficiency Tests,” Res. Quart. Am. ‘Assoc. for Health, Phys. Educ., and Rect., 
Vol. 12, No. 2 (May, 1941), 254-265. 


r” obtained between the test 
.853. Subsequent studies, *! 


245-249. New York, The Macmillan 


Appraisal and Guidance, р. 299. 


80 The Status of Measurement in Physical Education 


second.” 46 After such a blow begins there is an initial drop in 
systolic pressure and a subsequent recovery of this pressure to its 
original level. It has been found that when the recovery is delayed 
beyond thirty-five seconds a sluggish abdominal tone is indicated. 
However, since this delay is susceptible to emotional disturbances, 
no attention should be paid to delays of less than fifteen seconds, 
Circulatory irritability may be measured by noting the total rise of 
pressure in millimeters and comparing it with the length of the blow 
in seconds. If the rise is greater than the length of blow, 
irritability is indicated. When the maximum length of bl 
than forty seconds, shortness of breath is indicated. It has been 
found valuable, also, to follow the course of the systolic pressure 
during a blow made two and one-half minutes after the per 


excessive 
ow is less 


“if we are 


he circulation, it must be along 
some such lines as the flarimeter test." 


I. Michigan Pulse Rate Test for 


44Wells, Philip V., “Flarimeter Tests of Circ t 1 e 
Phys. Educ. Assoc., Vol. V, No. 4 (December, коры ш"? 
45Tbid., р. 48. 
46 Ihíd., p. 46. 
47 Wells, Philip V., ор. cit., p. 48. 
18" Physical Education in the State of Michigan, 


(April, 1920), 158. 125 ^ Ат. Phys. Educ: Rev., XXV 


"m 


Cardiac Functtonal Teste 81 


single test is infallible, it was thought that this: test was one of the 
best for use with children thirteen to sixteen and above, because of 
the fact that a strong, regular pulse and prompt recovery to normal 
rate after an exercise usually goes with first-class physical condition. 
Furthermore, slow recovery indicates lack of training in exercise. 
“When the pulse slows down after exercise irregularly, or when it is 
slower after exercise than normal, general weakness or some disorder 
is to be suspected.” Teachers should beware of an irregular pulse 
and advise children having such a condition to consult a physician. 
To facilitate the administration of this test teachers may instruct 
children how to count their own pulse and group them in a room in 
such a manner that they may record their own pulse rate. 

1. Have children count pulse while standing at ease before taking 
exercise, recording their pulse rates on blackboard. A good method 
is to count the pulse for fifteen seconds and multiply by four. Use a 
stop watch or, if one is not available, the second hand of a watch. 

2. Have the class run in place at the rate of three steps per second 
for fifteen seconds, lifting the feet at least 6 inches from the floor. 

3. Allow the children to stand at ease in position for three minutes, 
counting the pulse rate as follows: 

Beginning exactly one-half minute after exercise ceases. 
Beginning exactly one minute after exercise ceases. 
Beginning exactly two minutes after exercise ceases. 
Beginning exactly three minutes after exercise ceases. 

4. Rate each child according to the scale below: 

If pulse is irregular after run, drop one grade down the scale. 


Degree of | Physical habits or 


Time to recover normal . | Grade filness {уре 
One-half minute........ A Fine Athletic 
One minute. ..... B Good Active 
Two minutes. . . C Fair Moderately active 
Three minutes. ...... D Poor Sedentàry 
Pulse slower after run. .....- 2 E Very poor Invalid 


П. California Group Functional Тел{.%9 In most schools an 
f all children is not 


immediate complete physical examination o 


49Stolz, H. R., “Group Functional Tests,” Circular Letter M30, November 7; 
1923. Sacramento, California State Board of Education, Department of 


Physical Education. 


82 The Status of Measurement in Physical Education 


possible. To overcome this difficulty and to weed out the seemingly 
unfit, certain tests have been devised which can be given to large 
groups in a very short period of time. A good example of this type 
of testing is that suggested for use in the state of California. One 
feature of this is a pulse rating which is recognized generally as one 
of the best indices of organic efficiency. General organic efficiency 
may be estimated by such tests as: 


1. Body weight in relation to age and height; see tables and 
instructions issued by American Child Health Association. 

2. Breath-holding test: children stand with backs to blackboard. 
After explanation of the undertaking, leader directs children to face 
blackboard. After one or two deep breaths children inhale simul- 
taneously and hold breath while leader counts aloud the elapsing 
seconds. Each child indicates by writing on the blackboard the 
seconds elapsed between the inhalation and the first exhalation. 
It is important that children do not watch each other. 

5. Pulse rate return following exercise: the phenomenon of the 
pulse is explained to the children and the way to feel the pulse 
demonstrated by the leader and practiced by the children. The 
interest is to be kept objective as far as possible. The normal 
wide variation in pulse rate among individuals is stressed. The 
normal variation in each individual after exercise is explained. 
Children stand at ease facing the blackboard. At a signal from 
leader, each commences to count his pulse. After thirty seconds, 
leader says stop and each child writes on the blackboard the number 


mile run in three and a halt 
vity has been explained the 


(a) The names of those ch 


ildren who are excused from the test 
on their own request. 


Cardiac Functional Tests 83 


(b) The names of those children who give up during the test. 

(c) The names of those children whom the leader thought best to 
stop during the test. 

(d) The names of those children who evinced marked breathless- 
ness after the test. 

(e) The names of those children who showed marked fatigue after 
the test. 

5. Corroborative evidence: such as pallor, listlessness, skin 
eruptions, etc., should be recorded on the record sheet. 

At the present time physical education teachers are badly in need 
of a battery of functional tests for elementary and high school pupils 
which can be applied to groups of twenty to thirty students at a 
time and by which the instructor may quickly sort pupils into one 
of two groups: (1) those who need immediate physical examination 
by a physician, and (2) those who may apparently go ahead with 
the physical education program until such time as the regular 
physical examination can be given. 

Typical Comments on Cardiac Functional Tests. In making 
a comparison of the Crampton and Schneider tests, Scott®° studied 
the records of 410 men examined for flying and found a mean score 
of 11 by the Schneider test. He then checked the efficiency score 
against the official flying examination and found that the Schneider 
test indicated a rather clear line of demarcation between the men 
who were and were not qualified. There was no such clear line of 
demarcation when the Crampton test was used. Scores of 7 in the 
Schneider test indicate improper functioning of the neurocirculatory 
apparatus while a score of 9 should be given a thorough physical 
examination to determine whether his condition was due to disease 
or insufficient exercise. Scott feels, however, that the Schneider test 
should be used in conjunction with a thorough physical examination 
and that its results should not be taken unless this is done. Further 
he indicates that there are two conditions which no test of physical 
efficiency yet devised will reveal: 

(1) Bradycardia, on account of the low pulse rate, will give a 
better rating than the condition warrants; 

(2) Those who are disturbed physically by a physical examination 
will get a lower rating than they deserve on account of the high 
pulse rate. 
50Scott, V. T., “The Application of Certain Physical Efficiency Tests, 

Med. Assoc., LXXVI (March 12, 1921), 705. 


" Jr. Am 


84 The Status of Measurement in Physical Education 


Brittingham and White?! say that none of the tests such as 
Schneider's, Crampton's and Barach's are of much value because 
they do not indicate the condition of the myocardium or of the 
general cardiovascular apparatus. This conclusion was reached 
after a thorough trial on several hundred ward patients having no 
pathologic condition of the heart or lungs. Commenting on the 
Barringer test, they say that the delayed rise in systolic blood 
pressure seems to depend on general muscular condition rather than 
on what might be guessed as the individual's true " cardiac reserve. 

Schneider?? in his search for а trustworthy test of condition made 
а statistical study of several hundred cases, using the Crampton 
test and found it to be unsatisfactory "because of the fact that 
physical deterioration may be manifest in various ways in the 
cardiovascular mechanism." A similar criticism may be made of 
Foster's method. 

Later he discusses physical fitness tests in general, 3 classifying 


them under two main heads: (1) performance tests and (2) non- 
performance tests. 


since they require considerable care and 
time for the elimination of known disturbing factors. 

The old strength and endurance tests, weight lifting and running 
to exhaustion, revealed maximum capacity but were nof trust- 
worthy health guides. Strength tests in general do not permit us 
to draw satisfactory conclusions regarding the efficiency of the entire 

ody, only skeletal musculature, since physical exertion overtaxes 


the circulatory mechanism long before it exhausts the skeletal 
musculature. М 


Не points out that "the need of a test which would determine 


5!Brittingham, H. Н. and White, Р. D., "Cardiac Functional Tests," Jr. Am. 
Med. Assoc., LX XIX (December 2, 1922), 1901. 

5?Schneider, E. C., op. cit. 

53Schneider, E. C., "Physical Efficienc 


y and the Limitation of Effi 


h ciency Tests," 
Mind and Body, XXX Quly, 1923), 146 


m 


.pressure and inability to m 


Cardiac Functional Tests 85 


the heart’s working capacity or reserve power was keenly felt by 
cardiologists.” 


Our search for a fitness test has been narrowed down to the pos- 
sibility of using the changes in the blood, respiration and nervous 
system. 

Blood tests are not likely to be employed until good and reliable 
methods of determining haemoglobin content of blood are simplified. 
Respiration tests are unpopular because they require a highly 
trained gas analyst and because most ofus alter our rate and depth 
of breathing when we are being watched. 

It has been natural for many workers in the problem of fitness to 
use pulse rate and blood pressure because as a result of poor physical 
condition the brain center which regulates the frequency of the heart 
beat less effectively restrains the heart; the breathing brain center 
shows mild symptoms of fatigue manifested in shallow, rapid breath- 
ing and the vasomotor center which controls the distribution of the 
blood is decreased in tone, as is evidenced by a subnormal arterial 
ake quick and adequate adjustments in 


the redistribution of blood for changes of posture and exercise. 


Cureton54 emphasizes the fact that significant research on the 
diovascular tests has been meager. He con- 
easure of cardiovascular 
hen the test is adminis- 
sufficiently relaxed and 


validity of various car 
cludes that the Schneider test is the best m 
condition, but that the best results accrue w. 
tered two or three times, when the subject is 
at ease, and three to four hours after meals. 55 

Henry and Herbig®® in an attempt to throw light on the compar- 
ative validity of selected cardiovascular tests as measures of physical 
fitness concluded that the Schneider test has more individual vali- 
dation supporting it than any other test, and emphasized the need 
for additional experimental work on the problem of comparative 
validity. 

Salit and Tuttle" concluded that blood pressure measures fail to 


distinguish between healthy young adults diftering in degrees of 


54Cureton, Thomas K., Jr., op. ср: 281. 


55Ibid., p. 299. 
59 Henry, Franklin, “Functional Tests III: Some Effects of the Common Cold on 
Cardiorespiratory Adjustments to Exercise and Cardiovascular Test Scores, 


Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 13, No. 2 (May, 


1942), 185-200. x 
57Salit, Elizabeth Powell and Tuttle, W. W., "The Validity of Heart Rate and 


Blood Pressure Determinations as Measures of Physical Fitness," Res. Quart. 
Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 15, No. 3 (October, 1944), 
252-257. 


86 The Status of Measurement in Physical Education 


physical fitness; that increase in pulse rate and pulse rate a half 
minute after a standard exercise (in this study the bicycle ergometer) 
was the most valid measure of fitness in young men; and for young 
women the measure of pulse rate two minutes after the exercise 
showed most promise. They suggest that as a basis for validation 
of cardiovascular tests differences in body size and strength be 
considered in relation to a work output criterion. 

Cureton et al. 58 found that simple pulse rate changes such as used 
in McCurdy’s Condition Test were not 
ance as manifested in running. He furt 
of the all-out type do not produce extre: 


The need to carefully control and precisely administer cardio- 
vascular tests is emphasized throughout the literature. Elbe] and 
Green 60 found that height of bench (12, 14, 16, 18, and 20 inches) 
was not a significant factor in affecting pulse rate immediately after 
exercise when the cadence (in this case 24 per minute) was held 
constant. Miller and Е]Ье1б1 report that the pulse rate increased 


the average of 9.15 beats per minute 


increase in the cadence (within the 
range of 18 and 42 steps per minute.) 


58Cureton, Thomas К., Jr. 
59Tbid., p. 189, 


60Elbel, Edwin R. and Green, Earl L., “Pulše Reaction to Performing Step-Up 
Exercise on Benches of Different Heights,” 4m, Jr. of Phys., Vol. 145, No. 2 
(February, 1946), 521-527. i 
$1 Miller, Waldo A. and Elbel, Edwin R., “The 
Cadences in the Step-Up Test," Деу. Quart. 
and Rec., Vol. 17, No. 4 (December, 1946), 263-269, 
62Slater-Hammel, A. T. and Butler, L. Es “Accuracy in Securing Pulse Rates 
by Palpation,” Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol, Ж, 
No. 2 (May, 1940), 18-21. 


» ор. cil., p. 167. 


Effect Upon Pulse Rate of Various 
Am. Assoc. for Health, Phys. Educ., 


Cardiac Functional Tests 87 


enough to change the fitness classification of individuals on several 
of the pulse-ratio tests. 

At the present time cardiovascular tests are of limited use to the 
average teacher of physical education. The limits on their validity, 
the administrative problems involved, the effects on their reliability 
of such factors as age, sex, temperature, climate, humidity, emotional 
conditions and altitude mitigate against their practicality for school 


testing programs. From a research standpoint, however, the area 


is one of challenge to physical educators. 


Selected References 


Cureton, Tuomas K., JR.: Physical Fitness Appraisal and Guidance. St. Louis, 
The C. V. Mosby Company, 1947. Pp. 566. 

Section III, Cardiovascular Condition, contains four chapters which give 
exhaustive treatment to all ramifications of this subject. Chapter headings 
are: "Appraisal of Physical Fitness by Pulse Rate Tests", “Blood Pressure 
Tests of Physical Fitness”, “Rating Cardiovascular Condition by the Hearto- 
meter Pulse Wave Tests”, and “Analysis and Improvement of the Cardio- 
vascular and Basal Metabolic Condition Tests”. 

Советом, Tuomas K., Jr. et al.: Endurance of Young Men. Monographs of the 
Sociefy for Research in Child Development, Vol. X, Serial No. 40, No. 1. 
Washington, D. C., National Research Council, 1945. ; 

The following two chapters are of special interest: Chapter VIII, Analytical 
and Normative Studies of the Schneider Test," and Chapter IX, "Experimental 
Studies of the Schneider Test at the University of Illinois." 

Henry, FRANKLIN: “Functional Tests III: Some Effects of the Common Cold on 
Cardio-respiratory Adjustments to Exercise and Cardio-vascular Test Scores, 
Res. Quart. dm. Assoc. for Health, Phys. Educ., and Rec., Vol. 13, No. 2 (May, 


1942), 185-200. 4 m е 
Critically analyzes problems of comparative. validity of cardiovascular tests, 
and gives findings of tests administered to subjects suffering from colds. 
Норозох, PAuLINE, Lopez, ALICE F., PILLIARD, Mary and NEWMAN, Ann S.: 
"A Study of Some Relationships and Certain Physiological Measures Associated 
with Maximal and Submaxima Work,” Res. Quart. Ат. Assoc. for Heath, 
Phys. Educ., and Rec., Vol. 17, No. 3 (October, 1946), 208-224. 
Presents findings of a study designed to determine interrelationships between 
0 d physiological adjustments to exercises 


selected measures of performance ап s 1 S 
for the purpose of increasing ünderstanding of exercise tolerance and its 


assessment. к 
Larson, Li “ : oascular-Respiratory Function in Relation to 
N, Leonarp: “Сагфіоуаѕсшаг- *е8ри Y elatio: 

Physical Fitness," Suppl. to Res- Quart. Ат. Assoc. for Health, Phys. Educ., and 


Rec., Vol. 12, No. 2 (May, 1941), 456-468. 
Outlines И ѕсоре оЁ ће field of cardiovascular testing and 


includes a discussion of factors influencing these measures, effects of training, 
validity, problems of measurement, needed research, and a complete 


bibliography. 


88 The Status of Measurement in Physical Education 


MOREHOUSE, Laurence E. and MILLER, Aucusrus T, Јв.: 
Exercise. St. Louis, The C. V. Mosby Company, 1948. Pp. 352. 
Indicates the physiological bases of cardiovascular function 
of exercise thereto. See Chapters: VII, “The Heart"; VIII, 
in Exercise"; IX, "The Circulation of the Blood" 
ments in Exercise", 
SCHNEIDER, Epwanp C. and Kanrovicu, PETER V.: 
Exercise. Third Edition. Philadelphia, W. B. Saunders 
In addition to several chapters 
function, Chapter 15, “Health and 
of a number of “fitness” 
limitations. 


SELTZER, Cari C.: “Anthropometric Characteristics and Physical Fitness,” Res. 


Quart. Ат. Assoc. Jor Health, Phys. Educ., and Ree., Vol. 17, No. 1 (March, 
1946), 10-20. 


Physiology of 


апа problems 
"The Heart Rate 
;and X, “Circulatory Adjust- 


Physiology of Muscula. 
Company, 1948. Pp. 346. 
on physiological bases of cardiovascular 
Physical Fitness,” presents a description 
tests with a discussion of their possibilities and 


al anthropometric 

measurements and dynamic physi i 

justments for variations in statur 

WELLs, Рнилр V.: "Flarimeter Tests of Circulatory Fitness,’ 
Phys. Educ. Assoc., Vol. V, No. 4 (December, 1954), 44-48, 

The flarimeter is an instrument developed for blowing through an orifice at a 

fixed pressure and a given rate. By taking account of the initial systolic drop, 


the time for the sy: 20 mm. and the length of blow, a 


technique has been developed for measuring three fundamental attributes of 
the circulatory function, 


* Res. Quart. Am. 


d 


CHAPTER V 
Athletic Achievement Tests 


and Scoring Seales 


1 education in the United States 


The history of progress in physica 
dicates continuous concern with 


during the past eighty-five years in 
the student’s ability to perform, with his ability to accomplish 
something, or gain in performance ability as the result of exertion. 
In the early part of this eighty-five-year period, workers were con- 
cerned with gains or increases in bodily measurements. Later came 
an interest in increasing strength and since about the turn of the 
century a particular interest has been shown in measuring the 
improvement of pupils in the acquisition of skills of various sorts. 

The changing emphasis from gymnastics and calisthenics to 
games and sports is in part responsible for the very striking changes 
that have taken place in the testing program. The analysis of motor 
ability into its component parts such as speed, accuracy, coordi- 
hation, agility, balance, endurance and skill, demanded an inspec- 
tion into those activities and the choice of a few games or sports 
that contain these elements. This, then, opened up the problem 
of scoring or evaluating the achievements and this again to some 
Classification system by age height, weight and school grade so 
that competitions might be fairly conducted. 

Early workers had little or no training in scientific statistical 
ices set up were crude and unscientific. 


Procedures and the scoring devi 2 
Until recently, the chief criticism leveled at scoring schemes was 


that they were constructed arbitrarily and without regard to the 

actual performances of students. Opinion and judgment by men 

and women of experience in the field were taken as constituting fact. 

Scores were arbitrarily assigned to performances and no one was 
89 


90 The Status of Measurement in Physical Education 


able to tell how closely these scores represented actual conditions, 
or how closely a given score on one test represented the same relative 
performance ability listed for an identical score on another test. 
Use of appropriate statistical techniques has resulted in there being 
made available an increasing number of scientifically constructed 
achievement tests and scoring scales. 

From every standpoint the measurement of achievement seems 
to play an important role in establishing an efficient program of 
instruction. Scientifically constructed achievement tests and scoring 
scales are an invaluable aid to both student and instructor. For the 
student they: (1) stimulate his interest in activities through a fair 
evaluation of the pertormance he makes and the improvement 
he shows, (2) give him a basis upon which he may judge his stand- 
ing in the group, (3) point out to him weaknesses as well as 
strengths, thereby assisting him to improve his deficiencies. They 
are invaluable to the instructor: (1) in showing him the skill 
status of his classes, so that he may adapt his teaching to their 
needs, (2) in checking up on his students as to whether or not his 


teaching is being assimilated, (3) as an aid in further research 
and experimentation. 


Elementary and Secondary Schools 


efficiency of the boys and girls of the n 
both boys and girls were devised, eac 
of all-around skill. For boys, these ele 
(2) jumping, (3) speed in running, ar.d (4) accura. 
throwing. For girls, they are: (1) balancing, 
(3) accuracy or strength in throwing, and (4) 
fundamental in either volleyball, tennis, 


Су or strength in 
(2) speed in running, 
accuracy in a game 
basketball or baseball. The 
been revised from time to 
y statistical compilations, 
many other similar tests is 
known. Is this the median 


Athletic Achievement Tests and Scoring Scales 91 


performance of a large number of boys and girls, is it what 60, 75 
or 80 per cent can do—or just what does this standard represent? 
Another query is: “Does the test assume that the alternatives under 
each heading are measuring the same phase of physical ability?” 
The question may well be put: ‘‘ Does the baseball throw for accuracy’ 
measure the same ability as is measured by the baseball throw for 
distance?” 

Richards’ Efficiency Tests for Grade Schools.! A new 
principle was added to testing programs by the establishment of 
this test. Old grading schemes in the elementary school assumed 
the passing mark to be 70 per cent. After tabulating the results ot 
testing 400 boys in seven events, Richards applied this passing mark 
to each of the events. The supposition would naturally be that any 
boy who applied himself could by the end of a given year reach the 
standard set for that year. In order that comparisons could be 
made with standards in other cities and states, it would be necessary 
to know just what this 70 per cent mark represented. Is it what 90 
or 95 per cent of the boys can do after training for a year, or, just 
what does the mark represent? What percentage of the boys fail 
to measure up to the passing standard at the end of this year? 

Philadelphia Public School Age Aim Charts. The new 
features brought forward in the Philadelphia Age Aim Charts, 
devised by Stecher? and his associates for the purpose of stimulating 
interest in physical skills and furthering participation in athletic 
activities, included: 

1. Setting up age aims or standards of achievement for boys and 
girls. These were determined from the collection of field data and 
represented norms of performances for boys and girls from eight to 
sixteen in (1) the standing broad jump, (2) overhead ball throw, 
(3) 30-yard dash, (4) 40-yard dash, (5) 50-yard dash, and (6) chin- 
ning for boys and knee raising for girls. 

2. The awarding of a certificate of attainment to those who passed 
the standards in three athletic ability tests. 

3. The installation of a merit grouping plan which took into con- 
sideration five factors. Pupils were rated on physical ability, 
scholastic standing, removal of remedial physical defects, personal 


Richards, J. H., “Efficiency Tests for Grade Schools," Am. Phys. Educ. Rev., 

„ХІХ (December, 1914), 657. 
Stecher, Wm. A., retired as D 
Schools, in 1927. 


irector of Physical Education, Philadelphia Public 


92 The Status of Measurement in Physical Education 


hygiene and sportsmanlike conduct. Pupils receiving “С” in all 
five factors were classed as Merit Pupils and awarded a certificate 
of merit. 

Reilly’s Scheme—Rational Athletics for Boys and Girls.? 
As Reilly himself expresses it, rational athletics was "primarily a 
plan for getting all boys and girls to take active part in real, live, 
athletic competitions as the best possible method for all-around 
physical development.” In order that boys and girls may be fairly 
matched for competition Reilly established a classification known 
throughout the country as the Age-Grade-Height-Weight Plan. 

Boys and girls were divided according to grade into two general 
divisions, namely, junior (fifth and sixth grades) and senior (seventh 
and eighth grades). 

The exponents in each group were "arbitrary numbers used to 
express the relationship between the various factors of grade, age, 
height and weight. They corresponded roughly to the four grades 
in each group—one ‘4’ for under-age, under-height, under-weight 
children; and one ‘9’ for over-age, over-height, over-weight children." 


TABLE V 
Reitry’s Ace-Grave-Heicut-Weicut CLASSIFICATION PLAN 
Revised Classification for Boys and Girls 


Junior Division — 5-6 Years 


i 
Exponents 4 5 6 7 8 9 
Grade 5A 5B 6A 6B 
Age — up to 10 yrs. | 101-110 | 11:1-11:6 | 114-120 | 124-16 | 15:1 up 
Height — up to 42" | аза" | 467-08" | arr sli" | s-s2" | gis uo 
Weight — up to 64 lbs. 65-74 75-84 85-94 95-104 105 up 
Senior Division — 7-8 Years 
Exponents 4 5 6 7 8 9 
Grade 7A 7B 8A 8B 
Age — up to Mae | 1241-1530 | 131-136 | 15:7-14:0 | 144-150 | 1521 up 
Height — up to 4'4" 4'5"-4'8" 4'9"—-5' 5'1*-5'3" 54"-5'6" 5'7” up 
Weight—up to | 74lbs. | 75-89 90-104 105-119 | 120-129 | 130 up 
Example: Boy in 5B -Exponent for Grade 6 
Age 10:6 Exponent for Age 5 
Height 410” Exponent for Height 7 


Weight 84 Ibs. Exponent for Weight 6 


Sum of Exponents 24 — "Class В” 
Class A B [^ D E 


(Same for senior or junior division), up to 21 


3Reilly, Frederick J., New Rational Athletics for Boys and Girls. Bost 
Heath and Company, 1917. ud 7" 


22-25 26-29 30-33 34 or over 


Athletic Achievement Tests and Scoring Scales 93 


Entering the table for a given grade, an exponent could be secured 
for each of the four factors. The sum of these, as shown in the ex- 
ample, gave a figure which put the pupil into a particular class group. 

Table V shows the details of Reilly's classification scheme, which 
was used in connection with standards of achievement in a variety 
of athletic events. 

California Decathlon.‘ This test, based upon achievement in 
ten athletic events, introduced at the time a number of new ideas in 
the testing field. The ten events, which included fundamental move- 
ments of handling the body with the arms, the primary athletic 
activities of running and jumping, and skill in a number of athletic 
games, more closely approximated all-around ability than had 
previously been set forth. The zero achievement in each age scale 
represented a performance that any normal child could achieve, so 
that even the dub could at least score on the scale. The graduated 
score plan was introduced in some events by which, as difficulty of 
performance increased, score per unit of increment increased. An 
exponent plan was also used to classify students for rating achieve- 
ment. This test was an important milestone in the development of 
scientific achievement tests and scoring scales. 

Detroit Decathlon for Boys. This scheme of competition was 
devised by the physical education department of Detroit for the 
Purpose of selecting the best all-around athletes in the city and 
bringing them together in a meet to determine individual supremacy. 
It is “in part a revival of the old Greek idea. Although the events 
differ, there is an adherence to the spirit of the days when valor and 
physical perfection were looked. upon as the first essentials to a 
healthy mind and spiritual greatness." 5 4 

Boys may qualify for three different medals: gold, silver and 
bronze. Betore the boy is eligible for any other event he must qualify 


as follows in the chin, dip and sit-up: 


For gold—650 points in each event 
For silver—560 points imeach event 
For bronze—460 points in each event 


“Hetherington, Clark W., Manual in Physical Education for the Public a 
the State of California, Part IV, Syllabus on Physical Training Activities with 
Methods of Management and Leadership, pp. 120-122. Sacramento, Californi 
State Printi ffice, 1918. 

Pearl, EO PERI H. E., Health by Stunts, p. 155. New York, The Mac- 
millan Company, 1919. 

5 


ў 


94 The Status of Measurement in Physical Education 


The total number of points which must be obtained in ten events 
for the three medals are 6500, 5600, and 4600 respectively. 

“The main feature of the point table is that it furnishes an incen- 
tive to bring oneself to an average of ability in each of the ten events 
rather than to over-exert in any one or two." ? 

Los Angeles Achievement Expectancy Tables. Although no 
longer in use, another milestone in the development of athletic 
achievement tests was the work undertaken in the construction of 
the Los Angeles Achievement Expectancy Tables. In 1925 the 
Department of Physical Education of the Los Angeles City School 
District set for itself the problem of solving the question, “What 
may we expect of our boys and girls in the matter of performance 
in a variety of physical skills?" Ву 1927 achievement expectancy 
tables in thirteen self-testing events for grades five, six, seven and 
eight had been computed. The tables took into consideration both 
age and grade, since it was recognized that the same performance 
could not be expected from a ten year old as a twelve year old, even 
though they were in the same grade. In addition standards were 
set on five levels from "failing" to "exceptionally good," for each 
age-grade group. This study represented an immense amount of 


statistical labor as it involved approximately 172,000 records. The 
events included in the program were: 


‚Кай , Spring 
Soccer kick for distance. Baseball throw for strike. 
Basketball throw for distance. Batting for accuracy. 
Basketball throw for goal. Tennis serve. 
Handball serve. Baseball throw for distance. 
Volleyball serve. 


Running broad jump. 
Standing hop, step and jump. 
Running high jump. 
Standing three jumps. 

The Motor Ability Tests." In 1926 a National Committee of 
the American Physical Education Association made the following 
report on motor ability tests. The purposes of the tests were agreed 
to be: ә 


1. Securing motor ability related to physical fitness and health. 
т 2, Securing motor ability related to the skills in trades and in- 
ustries. 


3. Securing skills needed for a well-rounded recreative life. 


8 Ibid. pp. 163-164. 
TAs reported by the National Committee of the American Physica ati 
Association, March, 1926. Бар 


Athletic Achievement Tests and Scoring Scales 95 


4. Development of bodily control as a protection from accidents. 

5. Teaching the fundamental team skills because of their aid in 
developing social and moral values. 

The committee recommended eight groups of activities in which 
tests were to be developed, running from the first grade through 
the college age. 


1. Free exercises without the use of hand apparatus. 

2. Calisthenics with the use of hand apparatus. 

3. Marching. 

4. Dancing. 

5. Track and field athletics. 

6. Team game activities. 
a. Tests for rugby football. d. Baseball. 
b. Tests for soccer. e. Basketball 
c. Field hockey tests. f. Tennis. 


7. Apparatus Tests. 

8. Swimming. 

Tests were completed in groups 1, 5, 6 and 7. 

Rogers’ Athletic Index.* Mention elsewhere® is made of the 
work of Rogers in formulating a strength test as a measure of general 
athletic ability and a physical fitness index “which will guide the 
physical director in allocating physical education programs to 
individuals in accordance with their needs.” 

Rogers further contributes what he has termed the athletic index. 

The empirical formula adopted as the athletic index is as follows: 

Strength Index divided by 10 plus 
Physical Fitness Index plus 
Intelligence Quotient equals 
The Athletic Index 
An example is given to illustrate the athletic index more concretely: 
George Jackson, Age—17 years 6 months 
Weight—172 pounds 


LO.—116 
SÍ—2680 , BL ИР 
Normal S.I.—2436 
P.F.I—106 EFI = 106 
IQ. = 116 


Athletic Index — 490 
3 Rogers, Frederick Rand, Physical Capacity Tests in the Admin istration of Physical 
Education, pp. 127-133. New York, Teachers College, Columbia University 
Contributions to Education No. 175, 1925. 
9See Chapter VII, p. 128, and Chapter IX, p. 182. 


96 The Status of Measurement in Physical Education 


Bliss’ Study of Progression.!? Bliss made a valuable contri- 
bution to the measurement program in maling an analysis and 
scientific study of progression in physical education activities. His 
primary purpose was to determine the degree of development in 
strength and skill of junior high school boys and girls in the funda- 
mental elements of big muscle activities. The study also aimed to 
formulate a standard motor achievement test. Bliss made a kinesio- 
logical analysis of physical education activities, and states that 
“strength, speed and range are the attributes of menti-motor power 
or capacity". In his experimental study he used twelve different 
athletic achievement type tests. Some of his conclusions were: 
that there is an extensive overlapping of abilities in the different 
age ranges; a wide range of ability among children in the funda- 
mental skills; ‘a distinct progression is present in strength, and 
skill according to age, sex, and individual differences" in some 
activities tested; and it seems logical to believe that with proper 
experimentation and the collection of scientific data “progression 
may be constructed on a 100 per cent scale according to age and sex 
differences and individual differences within each age." 

The Cleveland Physical Ability Test for Boys.!! In an effort 
to section boys in physical education according to ability within each 
school grade, Cleveland constructed a test based upon primary 
activities of running, jumping, climbing and throwing. Scoring 
tables were prepared for events in each of the primary activities 
as well as lung capacity and grip. Norms for age and weight in 
relation to total score were established and a physical ability index 
could be computed by dividing the actual score by the norm. 

Jenkins’ Motor Achievements of Children. This study 12 
contains a survey of the motor achievements of three hundred 
children, boys and girls ages, five, six, seven, in nine events—35-yard 


dash, 50-foot hop, beanbag toss for accuracy, baseball throw for 
distance, soccer kick for distance, 


1 baseball throw for accuracy, 
standing broad jump, 


running broad jump, and jump and reach. 


“A Study of Progression Based on Age, Sex and Individual 
Differences in Strength and Skill,” Am. Phys Educ. Кл. XX 
1927), 1-21, and XXXII (February, 1927), 88-09. ZT (January, 
11Rowe, Floyd A., “Cleveland Experiments with Physi l Ability Tests,” - 
thlon, Vol. I, No. 5 (February, 1929), 5-9, 45. — ^ METIRI 
12Jenkins, Lulu Marie, 4 Comparative Study of Motor Achievements of Children of 
Five, Six and Seven Years of Age. New York, Bureau of Р blicati he 
College, Columbia University, 1930. ELE 


Athletic Achievement Tests and Scoring Scales 97 


It is a valuable contribution for those who are interested in 
estimating and comparing the motor development of children of 
this age in many of the elements used in the child’s play lite— 
running, hopping, jumping, throwing and kicking. 

On the whole the boys were superior to the girls in all events 
except the 50-foot hop, though this superiority was not significant 
in some of the events such as the beanbag toss for accuracy, the 
baseball throw for accuracy, and the jump and reach. 

Fundamentals of Motor Performance for Secondary School 
Girls.13 The Fundamentals of Motor Performance Test Battery 
was prepared in connection with the New York State Wartime 
Physical Fitness Program. This battery was one part of the New 
York State Physical Fitness Standards, developed during World 
War II by the New York State Education Department, Division 
of Health and Physical Education, in cooperation with the State 
War Council, Office of Physical Fitness as a phase of the general 
program to improve the effectiveness of State high school physical 
education programs. Tests and standards їп а wide variety of 
athletic events for boys, +4 and in physical education activity skills 
for girls!5 were developed by teacher committees from the field and 
members of the state staff.!9 While work on the standardization 
and validation of the tests was not completed, the program merits 
note because it typifies the wartime interest of state and local groups 
throughout the country in such endeavors, and it has also formed 
the basis for an ambitious tests and standards development program 
underway in New York State. 

The Fundamentals of Motor Performance Test has proved of 
special worth where used in the schools of New York State. It was 
designed to sample the fundamental motor skills including running, 
1?New York State Physical Fitness Standards, “Fundamentals of Motor Per- 

formance — Secondary School Girls,” Suppl. to Evaluative Procedures in Physi- 


cal Activiti. Girls and Young Women. New York State War Council, Office 
of няра e Division of Health and Physical Education, New York 


State Education Dı tment, Albany, New York, 1946. 
14New York State Physical Fitness Standards, Physical Fitness Standards for 


Boys апа У, Меп — 4 Manual for Instructors. New York State War 
Cotneil, Office of Physical Fitness, and Division of Health and Physical Educa- 


tion, N. ducation Department, Albany, New York, 1944. —. 
‚ New York State Educat 5 ards, " Evaluative Procedures in Physical 


‘New Yo ical Fitness Stand: i 
Pod ЫК, Women. New York State War Council, Office of 
Physical Fitness, and Division of Health and Physical Education, New York 

1 , State Education Department, Albany, New York, 1944. 

See page 217 for description of entire program. 


98 The Status of Measurement in Physical Education 


jumping, throwing, catching, striking, kicking, and the basic com- 
ponents of motor fitness including strength, muscular endurance, 
power, agility, coordination and balance. The items in the battery 
consist of: The Top (Brace Test Stunt No. 7); shuttle-run, 20 
seconds; standing broad jump; soccer kick for accuracy; striking, a 
handball-type test; push-ups, 1 minute; squat-thrusts, 30 seconds; 
and ball handling (wall pass type test involving shifting of foot 
position). The test scoring scales and descriptions for administering 
items are no longer available from the original source; but Rodgers! 
gives an excellent treatment of the theory behind the development 


of this test, the details of the test including scoring tables, and some 
suggested modifications. 


Colleges and Universities 


Meylan’s Test for Grading in Physical Education. Meylan, 
of Columbia University, pioneered in setting up standards of 
achievement for purposes of grading in physical education. His 
first comprehensive test of big muscle activity was developed in 1905 
and embraced three general phases of grading or marking in physical 
education: (1) health, (2) vitality, (3) bodily control. The test as 
originally designed included two aspects of health and a number of 
aspects of bodily control. The bodily control tests were all per- 
formed on apparatus. It is interesting to note the changes in 
Meylan’s tests over a period of years. 18 By 1913 he had increased 
his objectives to include posture, knowledge of exercises and games, 
knowledge of principles of hygiene, ability in athletic events as well 
as apparatus events, and skill in swimming and diving. Still later 
he introduced the plan where men were classified according to their 
achievement in a test consisting of the one lap-run, running high 
jump, bar vault, rope climb and swimming. Men in the excellent 
&roup were permitted to elect activities in the physical education 
curriculum, those in the average group were assigned to regular 
physical education classes, and those in the below average group 
were assigned to special classes. This plan is still used in many 
wd programs today, although a variety of classification tests are 
used. 


17Rodgers, Elizabeth G., “Evaluation of the Fund 
ance,” Jr. Health and Phys. Educ., Vol. 18, N 
18Williams, J. F., The Organization and 
рр. 276-278. New York, The Macmill. 


amentals of Motor Perform- 
o. 4 (April, 1947), 225-228. 
Administration of Physical Education, 
an Company, 1923. 


Athletic Achievement Tests and Scoring Scales 99 


Sigma Delta Psi. In order to stimulate interest in all-around 
athletic endeavor, there was started at the University of Indiana 
in 1912 a national athletic fraternity known as Sigma Delta Psi. 
The membership of the fraternity has grown until now it numbers 
chapters in many of our leading colleges and universities. The 
present (1948) requirements for membership are: 


. Running broad jump..... e x 
16 lb. shot put— according to a man's 
weight; 30 ft. for a man weighing 160 lbs. or 
over. 


1. 100 yd. dash... sees erence ee esete 1135 sec. 
2. 120 yd. low hurdles. . .. 16зес. 
3. Running high jump. . 5 ft. 

4 17 ft. 

5. 


20 ft. in 12 sec. 


6. Rope сЇітћ........... Eg e mosa e usina 
7. Throwing baseball. .. 250 ft. on fly 
8. Punting football. . . 120 ft. on fly 
9. Swimming. ... 100 yds. in 1 min. 45 sec. 
10. One-mile run 6 min. | 
1l. Tumbling..... око ae cone eit mnn (a) Front handspring 
(b) Fence vault with bar 


at chin height 
(c) Handstand, 10 sec. 
Erect carriage 
“С” plus 


sport, he may sub- 


12. Posture.. 

15. Scholarship 
Note: — If the candidate has won a letter in a varsity › 

stitute this for any of the above requirements except swimming. 


Schuettner's Scheme for Stimulating Interest. A.U 
Schuettner, working at the University of Illinois in 1919, set forth 
a very elaborate plan of testing to stimulate interest among the 
men in physical education. ! ? Though the plan has not been followed 
since Mr. Schuettner left the University, it represents a phase in the 
development of testing and should be recorded for that reason. 

The tests were divided into five sections: 


1. Track and field athletics. 
High jump, broad jump, 10 
12-pound shot-put, grenade. 

2. Aquatics. 
Speed swim, 

$. Antagonistics. 


Boxing, wrestling, fencing. 


19Schuettner, A. J, “The University of Illinois Plan to Stimulate Interest in 
e Ае ia for Men," University of Illinois Bulletin, Vol. 16, No. 55, 


April 14, 1919. 


0-yard dash, 50-yard burden run, 


distance swim, strokes, dives, rescue. 


100 The Status of Measurement in Physical Education 


4. Gymnastics. . | 
Chinning, dipping, hand vault, dive and roll, free exercises, 
apparatus exercises. 

5. Intercollegiate athletics. 

Mass athletics, varsity squad, freshman varsity squad. 


Twenty-three events in all were arranged on a “pointage system” 
so that the “average student, willing to exert himself,” could qualify 
for the emblem of the lowest division, “while further effort on his 
part will result in his raising his total pointage and securing the 
insignia of each of the five intermediate divisions, and finally the 
highest award. 

‘“To receive an emblem the student is required to score a certain 
number of points, with a maximum allowance and a minimum re- 
quirement in each of the five sections. This does not force him to 
score in every one of the 23 events, as he may choose one or a few of 
the events in each section, but it does require moderate ability in 
each branch of physical education.” 

Metcalf's Standards Proposed to the College Directors’ 
Society. In order “to get information concerning the ability and 
the outstanding weaknesses of a group, to determine the type 
of work most needed to round out their development, and to 
supplement the routine physical examination as a basis for classi- 
fication,” 20 Metcalf proposed certain minimum and maximum 
standards of achievement in a number of events involving the 
fundamental movements of running, jumping, throwing, .climbing. 
The particular levels of performance which he has suggested “have 
no special significance but merely show the method of applying the 
percentage tables” to the actual performances of college men. “Ina 


test of general physical efficiency for stimulating interest these stand- 


ards might well represent the novice, 
To qualify for a division one mi 
standards in ten of the fifteen 
in each of the five groups.” 


athletic and honor divisions. 
ght be required to equal or better the 
events, including at least one event 


Metcalf provided standards computed 
from records of actual performances and which indicate what could 


be expected from 90 to 95 per cent, from 50 to 60 per cent and from 
20 to 25 per cent of the men in a college group. In setting up a 
scoring table, material of this sort might be used as a basis, e. 95a 


20Metcalf, T. N., “Standards and Tests in Ph 


ysical Education,” Am. Phys. Educ. 
Rev., XX XVII (September, 1922), 322-326. E ES ^ 


° 


Athletic Achievement Tests and Scoring Scales 101 


estatement to the effect that thirteen seconds in the 100-yard dash 
represents a figure which can be equalled or surpassed by all but 
5 to 10 per cent of college men is a rather definite starting point and 
one which is easily understandable by all. 

National Amateur Athletic Federation Physical Efficiency 
Standards. This federation upon the recommendation of the 
conference called in Washington, D. C., November, 1922, by the 
Secretary of War, John W. Weeks, adopted for 1925 standards for 
promoting the physical efficiency of the youth of the nation. Dr. 
J. H. McCurdy was elected a committee of one to recommend 
standards for adoption.2! Standards were set in the 100-yard dash, 
running broad jump, running high jump, and bar vault. | 

The factors which determined the selection of events are: “interest 
in the events, cost of equipment and simplicity in conducting large 
group activities, with a scale of measurement that will show pro- 
gressive improvement of individuals from year to year, through 
junior high, senior high and college." 

Oberlin College Test. The Department of Physical Education 
for Men at Oberlin College has made use of a qualifying test for 


c Eu ЕА 
Element Event Standard 


176 yards (two laps, in- | 24 seconds. 


1. Running...... eee binis 

2. Jumping .| Running high jump. 4 feet 10 inches. 

3. Vaulting....... e On low horizontal bar. 47 inches. 

4. Climbing.......-.---- 20-foot rope, kneeling | 12 seconds. 
start. 


Two backward circles on high horizontal; hanging 


5. Pulling and lifting. . - - 
e start, arms extended, body motionless. 


Dips on parallel bars. 10 times. 
Baseball target throw | 3 hits out of 5 throws at 
for accuracy, 18-inch | 60 feet. 
circle. 
100-yard free style. 1 minute 45 seconds. 


6. Pushing. ......++-+-+ 
7. Throwing... e-e- in 


8. Swimming : 

9. Tumbling........---- Hand spring. 

Hand stand; movement | 10 seconds. 
confined to 4-foot diam- 
eter circle. 


10. Balancing......--+ +++ 


?! McCurdy, J. H., Physiology of Exercise, p. 211. Philadelphia, Lea and Febiger, 


1925. 


102 The Status of Measurement in Physical Éducation 


eligibility to an elective program in physical education.?? As will 
be noted, the test includes a sampling of practically all the elements 
which can be reasonably supposed to be included in all-around 
athletic ability. For the performance standard listed, an award of 
10 points is given. 

United States Military Academy Physical Efficiency Test. 
A scientifically constructed physical efficiency test has been de- 
veloped for use at the United States Military Academy. The 
test is designed to measure muscular strength, agility, power, 
coordination, endurance, speed and skill. The battery items 
include vertical jump, bar vault, dodge run, standing broad jump, 
sit-ups, chins, dips, soft ball throw, 300-yard run and rope climb, 
all of which require a minimum of standard gymnasium equipment. 
T-scores have been constructed for the events. Cadets are required 
to meet minimum standards. A physical aptitude test has also been 
proposed as a basis for admission to the Academy. Tests were 
developed when studies revealed that cadets lowest in physical 
efficiency tended to drop out of the Academy for some reason. The 
plan calls for the physical aptitude test to be administered with 
other entrance examinations. A number of events have been studied 
and scaled, and exact items in the battery will vary from year to 
year. Typical of batteries to be used are: Battery I: vertical jump, 
medicine ball put, chinning, dodge run, sit-ups, 300-yard shuttle run. 
Battery II: standing broad jump, medicine ball put, dipping, dodge 
run, sit-ups, 300-yard shuttle run. Events are weighted, and 
separate standards for three weight classes are provided for the 
medicine ball put. 

Achievement Tests in Activities for Physical Education 
Teachers in Training. It is clearly recognized by all th 
form the working tools of our profession and that it is 
the teacher of physical education to perfect himself“ 
of skills common to the varied activities in a soun 
cation program.” 23 If the student has 
ground of activities in the high school, h 
time and attention to preparing himself 
22Department of Physical Education for Men, 


Oberlin College, “Qualifyi Test 
for Elective Program in Physical Education,” Jr. Health and n Tues, Vol. 
VII, No. 8 (October, 1936), 512. E 

?3Nixon, Eugene W., "Achievement Tests in Activities for № 


S : Achie: len Physical Educa- 
tion Teachers in Training," Jr. Heallh and Phys. Educ., Vol. IV, No. 5 (May, 
1933), 50. 


at activities 
essential for 
in a wide range 
d physical edu- 
acquired an adequate back- 
e will be able to devote more 
for the actual job of teaching. 


» 


Athletic Achievement Tests and Scoring Scales 103 


As an actual matter of fact, it seems impossible for the teacher 
education institution to guarantee that the student shall receive all 
that he should in the way of specialized preparation for teaching 
as well as a thorough preparation in the technique of a wide range of 
activities. In order to place the responsibility for the activity 
training upon the student, a number of institutions have already 
set up batteries of achievement tests in a wide variety of activities 
in which students may qualify at any time during their collegiate 
training but which must be passed before graduation. By proper 
use of such batteries of tests, institutions may make certain that 
their students have acquired a reasonable measure of experience, 
skill and knowledge in the activities which form the working tools 
of their profession. 

Though at present no national achievement scales have been 
formulated, experiments in many institutions are being carried on, 
and in the future we look forward to definite progess in this measure- 
ment problem. 

The University of Illinois Plan. Though not designed for this 
particular purpose, it would appear that a plan proposed at the 
University of Illinois might be used as a basis for the development 
of achievement tests. ?* 

The University of Illinois plan provides for a proficiency exami- 
nation in twenty-eight physical education activities. Students are 


given an opportunity to take these proficiency examinations and, 
if they pass, are given University credit as would be the case if they 
had actually taken the courses. In general the proficiency examina- 
tion consists of three parts: (1) a written examination covering 
rules, strategy, etc., (2) а demonstration examination covering 
skills in the particular activity, and (3) a performance examination 
in a competition. Wherever possible all three parts of this exami- 
nation have been made objective. For example, the proficiency 


: З de 
examination in tennis consists of the following: 


1. A written examinatioti covering rules and strategy (10 ques- 


tions, 1 point each). ү 
2. A demonstration exami г 
each). Eight skills are listed, serving, 


iculum. Champaign, Illinois, Bailey and 


nation covering skills (5 skills, 2 points 
returning a service, executing 


24Staley, Seward C., 4 Sports Curr 
Himes, Inc., 1956. 
25 Tbid., p. 70. 


104 The Status of Measurement in Physical Education 


a forehand drive, a backhand drive, an overhead smash, a forehand 
volley, a backhand volley, and a lob. 

3. A performance examination consisting of playing a match 
(40 points as a possible perfect score). 

Students scoring a total of 40 or more points are awarded credit. 

Such a plan as this worked out in detail for every activity with 
which the major student should be familiar would provide teacher 
education institutions with tools by which they could definitely rate 
the skill and knowledge of prospective teachers. 


Scoring Scales and Standards 
McCloy’s Scoring Tables. 25 
application of the T-scale in setti 
education activities is question. 
groups are being considered. He 
itself to “universal scoring table 
principle of an increased 


McCloy has pointed out that the 
ng up scoring tables for physical 
able except when homogeneous 
Proposes a scale which will lend 
use” and which is built on the 
-increment award as performances become 
more difficult. The increments vary according to a parabolic curve 
(Y = X=) where n is the exponent best representing the particular 
event in question. In the case of the dash it is $, while for the 
running broad jump it is 1.66. Computation of the tables therefore 
demands the use of logarithms. The scales have а range of 1000 
points with the approximate world's record at 900, and assume that 
field events begin at zero performance and track events at infinity 
(zero velocity). These tables are especially valuable in making 
comparisons of totally different groups. 
There are fifty scoring tables for a wide variet 


y of strength, weight 
throwing, running, walking, jumping and vaulting events. From 
these scales an Athletic Quotient can be computed, by comparing 
actual achievement to norm achievement. The norms were de- 
veloped by a sample grouped by McCloy’s Classification Index. 
The Athletic Quotient was devised to equalize for size and maturity. 


The California Achievement Scales 


Vast quantities of data have been collected in California for the 
purpose of setting up achievement scales for hom 
26McCloy, C. H., The Measurement of Athletic Power, 

New York, A. S. Barnes and Company, 1932. 


Tests and Measurements in Health and Physical Education, pp. 110-113, F. S. 
Crofts and Company, 1942, 


ogeneous groups. 
Chapter III and Appendix. 


Athletic Achievement Tests and Scoring Scales 105 


The studies involved a set of scales for: (1) the Junior Pentathlon 
Program, (2) Boys and Girls in the Elementary and Junior High 
School, (3) Boys in the Secondary School, (4) College Men, (5) 
Secondary School Girls and College Women, and (6) Physical 
Fitness Pentathlon. 

The Junior Pentathlon Program.?7 While no longer used 
for its original purpose this program was among the earliest utilizing 
age-height-weight classification with scientifically constructed 
achievement scales. The Junior Pentathlon Program was an out- 
growth of the Junior Olympics sponsored at one time by leading 
newspapers in states west of the Mississippi. Its purpose was to 
foster the all-around development of boys from ten to sixteen years 
of age in a series of skills represented by five individual events 
including the dash, the running high jump, the running broad jump, 
the two-minute basketball goal throw and the ball put (a 5-pound, 
17-inch, leather-covered ball). In order to equalize competition, 
a classification scheme based on the factors of age, height and weight 
was computed.28 The scoring scale was based on a range of 1000 
points, with the zero end of the scale at three standard deviations be- 
low the mean and the 1000-point level at three standard deviations 
above the mean, so that fine distinctions in performance could be 
recognized. Point values were computed for each set of five co- 
efficients, the coefficients representing the exponent numbers given 
to the factors of age, height and weight. Since the range in co- 
efficients was wide, there were actually thirty-three classes or 
groups of exponents to be considered. For any given time, distance 
or height in the five events a definite number of points could be com- 
puted, the sum of which represents the individual's composite 
ability. The scales were used throughout the season (February 
to June) to check on the progress of the boy in his acquisition of 
skills and to determine his final rating at the end of the season. 
Schools and playgrounds found the program interesting and profit- 
able as a motivating device. 

Achievement Scales for Boys and Girls in Elementary and 
Junior High Schools. Following the research to determine the 
validity of the California Plan of classification,?? a huge program 
27These scoring charts were computed by Frederick W. Cozens and published 

by The Los Angeles Times. They are now out of print. 


28р, " 5 wee . 117. 
Bee ele We cd Nen N. P., "Age, Height and Weight as Factors 
in the Classification of Elementary School Children,” Jr. Health and Phys. 


Educ., Vol. III, No. 10 (December, 1932), 21, 58. 


106 The Status of Measurement in Physical Education 


of measurement was undertaken through the cooperation of twenty- 
seven school districts throughout the state. Standardized directions 
were formulated and tests given to girls in twenty events and to 
boys in thirty-three events. In all, approximately 79,000 records 
were gathered as a basis for the formulation of the scales. In each 
event there were eight groups, classes A to H, so that in reality there 
were set up 424 scales with an average of 1500 records for each 
individual event. 

For each class in each event, means and standard deviations were 
computed and smoothed. Since the classes A through H represent 
homogeneous groups, an even-step interval plan of scoring was 
adopted. Because the performances of these homogeneous groups 
ranged between three standard deviations on each side of the mean, 
it seemed logical to believe that no provision need be made for per- 
formances exceeding this range. A uniform method of procedure in 
scoring was adopted giving a score of 100 to performances at three 
standard deviations above the mean, 50 to performances at the 
mean and 0 to performances at three standard deviations below the 
mean. Increments for each score increase were computed by dividing 
100 into six times the value of the standard deviation. Since each 
scale was constructed by this method, any given score on one scale 
has the same relative performance value as an identical score on 
another scale. This permits the addition or averaging of scores to 
get total or average achievement. 

In the computation of these scales, interesting facts were obtained 
regarding variations in performance among the classes in events. 
In some events the size of the standard deviation in the various 
classes is practically the same and in these an average was taken by 
which to compute score increases. In other events, the size of the 
standard deviation increased with an increase in the stature of the 
boys and girls. In still others, the size of the standard deviation 
decreased as the boys and girls became older, 


taller and heavier. 
These facts were taken into consideration when computing score 
increases. 30 


Achievement Scales for Boys in Secondary Schools. 'This 
study was planned as the second of the series to provide scoring 


30For more detailed information, see Neilson, N. P., and Cozens, Frederick W., 
Achievement Scales in Physical Education Activities for Boys and Girls in Elemen- 
шу and Junior High Schools, p. 167. New York, A. S. Barnes and Company 

54. | 


Athletic Achievement Tests and Scoring Scales 107 


scales in physical education activities for students from the elemen- 
tary school through the college age level. As will be shown in Chap- 
ter VI, “Indices for the Classification or Grouping of Students," the 
scheme of classifying secondary school boys difters somewhat from 
that used for elementary schools on account of the higher age range 
and type of activity. Hence before scales could be set up, it was 
necessary to validate a new classification plan.?! The data for this 
study were gathered in the Los Angeles City School District and 
surrounding communities. Over 56,000 performance records were 
used as a basis for the construction of scales in forty-five events, 
with six classes in each event, an average of 1250 records per event. 

As in the case of the elementary study, means and standard devi- 
ations in each event were smoothed and the smoothed calculations 
taken as representing a better approximation than the computed 
figures. It was again found that there were three types of variability 
or spread jn the performance records, uniform standard deviations, 
standard deviations increasing according to an increase in physical 
stature and those decreasing as physical stature increases. 

Since the various classes from A (larger boys) to F (smaller boys) 
represent homogeneous groups, the even-step interval plan of scoring 
was again adopted. Scores were computed on the same basis as in 
the elementary study, that is, a range of three standard deviations 
on each side es the mean representing a range of 100 scores, with the 
mean at 50, the performance level of three standard deviations 
above the mean at 100 and that of three standard deviations below 
the mean at 0. 

. Achievement Scales in Physical Education Activities for 


College Men. The problem of classification for college men and the 


advantages of having a height-weight classification (such as tall- 


slender) will be discusses in Chapter VI. е | 
Here again a homogeneous grouping permits an even-step interval 
plan of scoring with the range in scoring used as indicated in the 


elementary and secondary studies. | ў 
In this study, 32 in contrast to the findings in the other studies, 


the standard deviations of all height-weight classes (such as short- 


artin H. and Neilson, N. P., Physical Education 


зс i /., Trieb, M. 
ozens, Frederick W., Trie Келн Schools, Chapter II. New York, 


Achievement Scales for Boy i 
A. S. Ba and C any, 1956. е 1 ET 
t со Frederick We Achievement Scales in Physical Education Activities for 


College Men. Philadelphia, Lea and Febiger, 1956. 


108 The Status of Measurement in Physical Education 


heavy, etc.) were approximately the same and were averaged to 
make the best-fit sigma of a particular event. Scale increases, as for 
example from 50 to 51, 77 to 78, etc., were computed according to a 
бт 

100)" 

Achievement Scales in Physical Education Activities for 
Secondary School Girls and College Women. This study?? 
completes the achievement scale series as originally planned and 
contains scales for a variety of team game elements for both second- 
ary school girls and college women. With senior high school girls, as 
well as with college women, the factors of age, height and weight 
have such a small relationship to performance that there is no ad- 
vantage to a classification scheme arranged on the basis of these 
factors. Hence the group may be considered homogeneous and 
scales have been arranged according to one classification only. This, 
quite naturally, greatly simplifies the construction of achievement 
scales. The list of sports for which scales have been formulated 
includes archery, baseball, basketball, field hockey, running, soccer 
and speedball, swimming, tennis and volleyball. 

Physical Fitness Pentathlon.?* Scoring scales are provided 
in fourteen events selected to measure aspects of physical fitness. 


"These are described in Chapter IX, Physical Fitness and Motor 
Fitness Tests, page 176. 


procedure previously described 


National Standards of Achievements for Gurls 
and Boys 


A group of leaders in national education and physical education 
organizations gave careful study to the formulation of National 
Physical Achievement Standards. The studies for both boys and 


girls were sponsored by the National Recreation Association and 


attempt to represent a cross-section ‘of the school population: of 
the country. 


33Cozens, Frederick W., Cubberley, Hazel J. and Neilson, N. P., Achievement 
Seales in Physical Education Activities for Secondary School Girls and College 
Women. New York, A. S. Barnes and Company, 1937. 

4“The California Physical Fitness Pentathlon,” Bulletin of the Cali 


c ifornia Stati 
Department of Education, Vol. XI, No. 8 (November, 1942), g EI 
California. ү 


° 
Athletic Achievement Tests and Scoring Scales 109 


Standards for Boys.?5 This booklet includes standards on a 
pass or fail basis for five groups of boys, primary (ages eight and 
nine), elementary (ages ten and eleven), intermediate (ages twelve 
and thirteen), junior (ages fourteen and fifteen), and senior (ages 
sixteen to nineteen). Activities are divided into four groups with 
definite standards for each: (1) game skills, (2) track and field, 
(3) gymnastics and (4) water sports. The standards indicate various 
levels of achievement among the various age groups. For example 
60 per cent of the boys of the primary and elementary groups are 
able to reach the standard or exceed it; in the intermediate and 
junior groups only 50 per cent can reach the standard; and at the 
senior level only 40 per cent can qualify. In each of the four groups 
of activities, a choice of certain individual events is given. The 
questions which may be raised about this plan are these: 

l. Since we know that height and weight as well as age affect 
performance, why should individuals be grouped by age alone? 

2. The pass or fail method, while administratively simple, does 
not offer the possibilities of motivation contained in a scale which 
provides a wide range of performance, or which has a number ot 
levels. 

Standards for Girls.?® This study was sponsored by the 
National Recreation Association in cooperation with the Society ot 
State Directors of Physical Education. Approximately 400,000 
individual records were collected in three groups of activities, game 
skills, self-testing activities and individual athletic activities. The 
achievement standards when finally computed were put on a per- 
centile basis. 


Thiswill makeit possible fora teacher to “place” student performance 
from time to time on the basis of achievement and progress. The 
standards will help to furnish motivation for the program in time 
allotment, training of personnel, increased facilities, and oppor- 
tunities for participation. They will also assist the teacher to rate 

er own curriculum, method and teaching achievement. The study 
should be expected to answer the following questions. What skills 


do girls at each age between eight and eighteen years of age possess? 
35Braucher, H. S., Chairman, National Physical Achievement Standards. New 


Y i tion Association, 1931. E 
"Hou EC p Standards of Achievement for Girls," Jr. Heallh ana 


Phys. Educ, Vol. V, No. 9 (November, 1954), 29. f 
Seas eect booklet and scoring tables, Amy R. Howland, Nationa 
Physical Achievement Standards for Girls. New York, National Recreation 


Association, 1936. Reprint 1946. 


Ы L . . 
110 The Status of Measurement in Physical Education 


What may be expected of girls of each of these age groups? What 
shall the requirements of national achievement standards be? 
Which activities are fundamentally most sound as a basis for 
standards??? 

While such a study as this has some value, at least two criticisms 
should be given consideration: 

1. The factors of height and weight as well as age have a consider- 
able influence upon performance up to age fifteen. Hence all three 
factors should be given consideration when a scoring scale is con- 
structed. Girls of sixteen years and over may be placed in one 
classification group because the factors of age, height and weight 
have practically a negligible effect upon performance. 38 

2. For purposes of motivation it is highly desirable to construct a 
scoring scale with a large number of divisions (100 if possible) so 
that small increases in performance may be noted. This is essential 
if the girl's interest in improvement is to be held. A scale of 9 points 
does not conform to such a criterion. 

Physical Performance Levels for High School Girls.?9 In 
response to a request by the Unites States Office of Education the 
Research Committee of the National Section on Women's Athletics 
undertook to provide national performance standards in a series of 
events designed to measure muscular control, coordination, speed, 
agility of movement and strength. The items for the battery were 
selected empirically out of the considerable experience of the Com- 
mittee with such test items. The items include: for general motor 
ability: standing broad jump, basketball throw, and potato race; 
for strength of special muscle groups: abdominal muscles — sit-ups; 
arm and shoulder girdle muscles — pull-ups and push-ups; agility 
— 10-ѕесопа squat thrust; endurance — 30-second squat thrust. 
The items were selected also in light of administrative ease and 
simplicity of equipment. A three standard deviation scoring table 
was developed from data on a national sampling of high school girls. 
The scoring table follows, but in order to insure accuracy exact 
directions for administering each test must be followed. 

37 Ihid. 
38See Cozens, Frederick W., Cubberley, Hazel J. and Neilson, 
Scales in Physical Education Activities for Secondary Schoo 

Women, p. 112. Also an unpublished study by Frederick W. 

39" Physical Performance Levels for High-School Girls," 
Washington, D. C., U. S. Office of Education, Vol. 3; 


“Physical Performance Levels for High-School Girls, 
Educ., Vol. 16, No. 6 (June, 1945) 508-511. 


N. P., Achievement 
l Girls and College 
Cozens. 

Education for Victory, 
No. 21 (May 3, 1945). 
" Jr. of Health and Phys. 


Athletic Achievement Tests and Scoring Scales 111 


TABLE VI 


SCORING TABLE ror PHYSICAL PERFORMANCE LEVELS 
(1945 Revision) 


Standing | Basket- 10-second 50-.econd 
Scale broad ball Potato | Pull- | Push- | Sit- squat squat 
score | jump throw race ups ups ups thrust | thrust 
100 7-9 78 8.4 47 6l 65 9-1 24 
95 7-7 75 8.6 45 58 6l 9 25 
90 7-4 72 8.8 42 54 57 8-5 22 
85 7-2 68 9.0 59 51 54 8-1 21 
80 6-11 65 9.4 57 47 50 8 20 
75 6-9 62 9.6 34 45 46 7-5 19 
70 6-7 59 10.0 32 39 43 7-1 18-2 
65 6-4 56 10.2 29 56 39 7 18 
60 6-2 55 10.4 26 52 56 6-2 17 
55 6-0 50 10.6 24 28 33 6-1 16 
50 5-9 46 11.0 21 25 29 6 15 
45 -7 45 11.2 18 21 25 5-2 14-2 
40 5-5 40 11.6 16 17 2 5-1 14 
35 5-2 37 11.8 15 15 18 4-5 15 
30 5-0 54 12.0 10 10 15 4-2 12 
25 4-9 51 12.4 8 6 11 4 11 
20 4-7 27 12.6 5 2 7 5-3 10 
15 | 4-4 24 | 13.0 5 1 5 5-2 9 
10 4-2 21 15.2 1 0 1 8-2 
5 4-0 18 15.4 0 0 0 2-3 7-2 
0| 5-9 15 15.6 0 0 0 2-2 7 


n Motor Fitness Events. Using the 
sification scheme proposed by Cozens, 
41 has developed achievement scales 
ctivities popularly used in physical 


Achievement Scales i 
height, weight and age clas 
Trieb and Neilson,*? Knapp 
for high school boys in six а 


E i :eb, Martin A. and Neilson, N. P., id 

Hines ki ww deum Баја in Six Physical Education Activities for 
Ва неу School Boys,” Res. Quart. Am. Assoc. for Health, Phys. Educ., and 
Rec., Vol. 18, No. 5 (October, 1947). 187-197. 


112 The Status of Measurement in Physical Education 


fitness programs. The activities scaled include, push-ups, squat- 
jumps, sit-ups (90-second limit), pull-ups, potato race, and V-lift. 
The scales were constructed on a basis of 20,000 test records of 
Wisconsin High School boys, and corrected for expected progress 
on the basis of 2400 test and re-test records. 

Standards were also made available for events in the Victory 
Corps Program, * The United States Office of Education wartime 
physical fitness program for high school pupils. 12.44 

The Iowa State course of study? contains scoring tables for a 
variety of motor fitness events including chinning, push-ups, squat- 
jumps, two-minute sit-ups, squat thrusts, sixty-second run, and 
shuttle run. The T-score tables are based upon a five standard 
deviation range. Directions are given for combining scores from 
any of the events tested. 

Additional achievement scales in motor fitness tests and indi- 


vidual items are described in Chapter IX, Physical Fitness and 
Motor Fitness Tests. 


Selected References 


Cozens, FREDERICK W.: Achievement Scales in Physical Education Activities Jor 
College Men. Philadelphia, Lea and Febiger, 1956. Pp. 118. 

Achievement scales in thirty-five individual events for college men have 
been constructed on the same statistical plan as was used in the elementary 
and secondary school studies. Scales for 9 height-weight classification groups 
in each event are shown. 

Cozens, FREDERICK W., Tries, Martin Н. and Nettson, N. P.: Physical 
Education Achievement Scales Jor Boys in Secondary Schools. New York, A. S. 
Barnes and Company, 1936. Pp. vi and 155. 

Standardized directions and achievement scales in forty-five events are given. 


ates have been constructed on the same basis as the scales for the elementary 
study. 


42See also p. 174 for listing of events. 


43 Morgan, Cecil W., 4 Study to Establish Standards for Certain Achievement Tests 
in the Victory Corps Physical Fitness P. 


iclory ] rogram. Unpublished doctoral disserta- 
tion, University of Pittsburgh, 1944, 


44United States Office of Education, “Scales for Tests for High School Boys of 
Strength of the Abdomen and Back,” Education for Victory, Vol. 3, No. 4 
(August 21, 1944), 4. 

45The State of Iowa, The Iowa Program of Physical Education for Boys, Secondary 
Schools. (Tentative Edition). Department of Public Instruction, State of 
Towa, 1945, pp. 243-358. 


Athletic Achievement Tests and Scoring Scales 113 


McCtoy, C. H.: The Measurement of Athletic Power, Chapter П, “Scoring Tables 
for the Measurement of Athletic Performance,” pp. 9-37. New York, A. S. 
Barnes and Company, 1952. Pp. xiv and 178. 

Every student of measurement should be familiar with McCloy’s technique 
in the development of a basis for establishing an increased-increment scoring 
plan for heterogeneous groups. The Appendix contains scoring tables for boys 
in fifty events largely involving track and field performance. 

NEILSON, N. P. and Cozens, FREDERICK W.: Achievement Scales in Physical Educa- 
tion Activities for Boys and Girls in Elementary and Junior High Schools. New 
York, A. S. Barnes and Company, 1954. Pp. x and 171. 

This book contains standardized directions and achievement scales in thirty- 
three individual athletic events for boys and twenty for girls. Statistical 
techniques used in scale construction are discussed. 

Cozens, FREDERICK W., CUBBERLEY, HAZEL J. and Nettson, N. P.: Achievement 
Scales in Physical Education Activities for Secondary School Girls and College 
Women. New York, A. S. Barnes and Company, 1937. E 

Achievement scales in nine sports for senior high school girls and college 
women with a total of seventy-nine different tests. Included also in this 
volume are achievement scales for junior high school girls in twenty individual 

events embracing five sports. 


CHAPTER VI 


Indices for the Classification 


or Grouping of Students 


History of the Problem. The proper grouping of students in 
physical education activities has been a concern of the leaders in the 
profession for nearly seventy years. One of the dominant ideas 
behind the establishment of the old intercollegiate strength test was 
the matter of proper classification for competition in athletic sports. 
While more recent concern takes into consideration such a grouping, 
it has broadened its horizon and now is conscious of divisions for 
other purposes than interscholastic or intercollegiate competition. 

For a considerably longer period than thirty years, the profession 
has recognized the desirability of equalizing, in so far as practicable, 
the physical differences which exist among individuals of the same 
age group. The range of size, maturity and performance ability of 
children of a given age presents a problem in physical education 
which, if left unsolved, produces many difficulties, among which are 
a lack of interest on the part of the smaller and weaker children, and 
a real physical hazard for these same pupils while in competition 
with those who are more mature and of superior size and strength. 

Prior to 1917, a number of attempts in various parts of the country 
were made to group boys particularly in such a way as to minimize 


the possibility of having large, mature boys compete against smaller | 


mature boys and of having large, immature boys compete against 

smaller immature'boys. At first, weight teams of various kinds 

were organized. There were 90-pound teams, lightweights, middle- 

weights, heavy weights, etc. Later on a more logical view of the 

situation gave rise to the development of a considerable number of 
114 


, ———Á—————————————Á SS 


Indices for the Classification of Students 115 


ways of grouping with emphasis upon one or more of the factors of 
age, grade, height and weight. 

While it might be argued that there are many other factors 
besides age, height and weight which may have a significant bearing 
upon performance, it is still a fact that a usable scheme must be 
administratively simple and that to include such items as length of 
trunk, length of arms and legs, width of shoulders and hips, depth 
and girth of chest, and a host of others, would complicate the classi- 
fication scheme beyond all reason. 

Neilson! made a constructive analysis of a large number of 
factors which may determine the performance of boys and girls in 
physical education activities, and comes to the conclusion that there 
are two groups of factors causal to athletic performance: (1)‘‘those 
to be employed in a pupil classification scheme,"? and (2) those 
which are developed through activity. In the latter category may 


be placed such factors as intelligence, emotion, skill, endurance and 


the like, and these will operate to determine differences in pupil 
achievement. In a classification plan, however, the growth process 
is highly important, especially during the adolescent period, and for 
all practical purposes age, height and weight must be considered 
basic factors. 

In oi malos reported a plan using all four of these factors and 
it seemed so logical that it was quite generally adopted throughout 
the United States and has been given the name “four-point classi- 
fication.” Its use in the State of California and the recommendation 
given to it by Hetherington in the first California State л had 
much to do with its wide use for a period of fifteen years. Reilly s 
plan, although more or less empirically set up, had an experimental 
background and, while designed especially for the fifth, sixth, 
seventh and eighth grades, was modified by Bete q i ma Stolz 
to take care of the age level in the senior high school.? This revision 


ement in Selected Athletic Events. Unpub- 


! Neilson, N. P., A Study of Achiev nt Galifornia, 1936. 


lished doctor’s dissertation, Uniyersity 
Ibi 1 / 
еу P ick, J., “A Rational Classification of Bond Си» for Athletic 
dor, ; ÇX ; 3-24. 
ition,” 4 Phys. Ed. Rees, XXII (January, 1719), | 
^ reir oe id Manual in Physical Education for the Duc) го 
ofthe Si of C (TA "Part IV, p- 98- Sacramento, California State Printing 
fale о, ац, , » 
©) H 5 E 
see Puis t R., Bulletin of the California State Bur of Education, Depart 
و‎ rt R., 5 
ment of Physical Education, M-4, November 15, 


116 The Status of Measurement in Physical Education 


was so highly regarded that it was adopted by the California Inter- 
scholastic Federation® and was in constant use until 1935. 

McCloy’s Studies in Athletic Handicapping.? In 1927 
McCloy reported a study in which he evaluated the factors of age, 
height and weight in relation to their influence upon performance. 
This study with boys showed that the best combination of the 
factors to use in a formula for homogeneous grouping is 


Index=8A+1144H+W 


in which A represents the age in years, H the height in inches and W 
the weight in pounds. The multiple correlation coefficient between 
the three factors and performance runs close to .7, a very satisfactory 
relationship for purposes of classification. 

In 1928 a study by Delaney,’ under McCloy’s direction, was 
made on girls between the ages of ten and sixteen to determine the 
significance of the factors of age, height and weight in relation to 
performance. Because of certain difficulties in relating physiologic 
age to chronologic age, it was thought best to set up two formulas: 


For girls fourteen years and under — Index=10A-+-W 
For girls fifteen years and over — Index=10A+H 


The multiple correlation coefficients between the three factors 
and performance are much lower than for boys, and a conclusion 
which seems quite obvious might be that these factors have much 
less influence upon performance with girls than with boys. 

A further study by McCloy in 1932 brought forth a classification 
index differing somewhat from the one devised in the 1927 study, 
but based upon more data and a searching investigation throughout 
a wide age range. His classification index of 20A + 6H + W em- 
phasizes the importance of height and shows that much less stress 
should be placed on weight than was given in the 1927 study. 
An organization com 


centralize control in t 
March 28, 1914. 


7McCloy, Chas. H., “Athletic Handicapping by Age, Height and Weight.” 
Am. Phys. Educ. Rev., XXXII (November, 1927), 635-642. 
McCloy, Chas. H., The Measurement of Athletic Power. New York, A. S. Barnes 
and Company, 1932. 


McCloy, Chas. H., Tests and Measurements in Health and Physical Education 
рр. 45-48. New York, F. S. Crofts and Company, 1942. : 
3Delaney, Mary, "Age, Height, Weight and Pubescent St 


: ; andards for the 
Athletic Handicapping of Girls" Am. Phys. Educ. Rev., XX 
1928), 507-509. ys. Паис. Rev, XXXIII (October, 


posed of high school administrators whose duty it is to 


he conduct of high school athletics in the state. Organized 


Indices for the Classification of Students 117 


pue age ceases to be as important a factor after 17, for those over 
is age the formula is 6H + W. McCloy further brings forth con- 
clusive evidence that the factor of school grade, previously con- 


sidered important in grouping children, contains no information 
not already to be tound in the other three. Carpenter? concludes 


that McCloy’s formula (20А + 6H + Weight) is the best one to use 
in classifying first, second and third grade boys and girls. 

The California Studies. Just as McCloy’s work was published, 
it so happened that a thorough investigation of California’s classi- 
fication schemes was in progress. The first of these studies with 
elementary school boys and girls served to verify McCloy s findings 
as well as to establish the validity of the California scheme of classi- 

10 As will be noted the index for classifying elementary 
school children in California (20A + 5.5H + 1.1W) closely re- 
sembles McCloy’s classification index. КО 

Another California study relating to the classification of boys 
from ten to sixteen years of age gave further evidence fowar 
verifying McCloy’s index, despite the lower emphasis on height 
(20A + 4.33H + W). 11 The method used in arriving at this index 
consisted in striking an average covering the indices of the separate 
events included in the program. Specifically, these events are the 
dash, the running broad jump, the running high jump, the two- 
minute basketball goal throw and the ball put, an event resembling 
the shot-put, but using а 5-pound leather-covered ball, 17 inches 


fication. 


in circumference. 
The study involving а classification scheme for secondary school 


boys!? proceeded in a manner similar to that in which the index 


tor elementary school boys and girls was established. 
"Although it might be supposed that the elementary school index 
sifying secondary school boys, it was felt that 


could be used for clas 
h a procedure questionable, 


two important items might make suc 


9 Carpenter, Aileen, “Strength Testing in the First Three Grades,” Res, Quart. 
at Assoc. for Health, Phys. Educ., and Rec., Vol. 13, No. 5 (October, 1942), 
328-332. 

10Cozens, Frederick W. and Neilson, N. P., “Age, Height and Weight as Factors 
in the Classification of Elementary School Children,” Jr. of Health and Phys. 
Educ., Vol. 3, No. 10 (December, 1932), 21, 58. 

11Cozens, Frederick W., Junior Pentathlon, 1955. Los Angeles, The Los Angeles 


Times, 1933. 
12Cozens, Frederick W., Trieb, Martin H. and Neilson, N. P., “The Classification 


of Secondary School Boys for Purposes of Competition,” Res. Quart. Am. Phys. 
Educ. Assoc., Vol. VII, №. ] (March, 1936), 56—45. 


118 The Status of Measurement in Physical Education 


namely: (1) the higher age range of the secondary school group, and 
(2) the type of activities used by this higher age range. The activi- 
ties of the secondary school period involve events where weight and 
strength play a rather prominent part. As in the elementary study 
a best-fit index was computed, taking into consideration five funda- 
` mental groupings of events according to type. In the following 
indices, A refers again to age in years, H to height in inches and W 
to weight in pounds, 


Running Index =2A 4-0.978H -0.126W 
Jumping Index —2A +0.502Н -0.113W 
Throwing Index =2A 4-0.207H 4-0.177W 
Index of Events of Weight and Strength 

= 2A --0.548H 4-0.199W 
Kicking Index —2A +0.141H+0.187W 


SS ees 
Best-fit Index | — 2A --0.475H +0.16W 


This best-fit index has been checked by computing an index for 
each of three groups of boys in a composite of seven events. The 
average of these three indices gave a Best-fit Index of 2A + 0.481H 
+ 0.16W, closely approximating that shown above. 

The question may be raised as to why two times the age was 
selected instead of 10A or 20A. The answer is simply this, that 
administratively it is impractical to attempt to classify boys more 
than twice a year or at the beginning of each semester, The 2A will 
then allow for a change in index or exponent value of one for each 
six-month period. 

Tt should be pointed out that to be truly scientific a classification 
plan should be devised for each year of chronological age and for 
each event in which the pupil competes. Though such a scheme 
would be utterly impractical from an administrative point of view, 
the lack of it may be doing the pupil a real injustice. However, if 
we find that a best-fit classification index (such as 2A + 0.4751 
+ 0.16W) has a very high relationship with an index specifically 
designed for a particular age and event, it may be concluded that the 
best-fit classification plan serves adequately despite the refined 
technique of the specific index, Neilson 13 has found that there are 
extremely high relationships between the best-fit (or 
classification scheme and the specific index at ages below 
He has also shown that specific classification indices for a 


13Neilson, N. P., op. cit. 


general) 
nineteen, 
particular 


Indices for the Classification of Students 119 


age in two different events are very closely related. The coefficients 
of correlation run between .940 and .990 except in a very few in- 
stances and average .950 or better. It may be assumed from this 
study that a general classification scheme covering an age range of 
five to six years is justifiable and that achievement scale scores 
computed for the two classification schemes will be highly related. 

Grouping of College Men. Two studies in the field point very 
definitely to the fact that with men of the college age level, the 
factor of age may be entirely disregarded because it contributes 
nothing to performance ability.!* The two factors of height and 
weight, however, do have varying amounts of influence upon per- 
formance and for purposes of class competition and intramural 
sports should be given consideration. 

McCloy’s index for classifying boys over seventeen is: 

Index=6H + W 


There is no question about the validity of this scheme, but, in 
view of the fact that the stature type of college men changes 
relatively little, the scheme set forth by Cozens!? offers certain 
advantages. 

His scheme consists of grouping men into one of nine different 
stature classes according to the individual's height and weight. 
The grouping automatically sets up nine classes: 


Tall Slender Medium Slender Short Slender 
Tall Medium Medium Medium Short Medium 
Tall Heavy Medium Heavy Short Heavy 


The mention of a man’s classification as “Tall Slender” immedi- 
ately gives the instructor a picture of the individual's type of stature 


and is useful as a name. 

Grouping of High School Girls. Because of the limitations 
on age, height and weight either singly or in combination as classi- 
fiers for girls and women, Bookwalter!® suggests a classification 
plan for high school girls utilizing Cozens’ nine classes named in 
14Cozens, Frederick W., “А Study Of Stature in Relation to Physical Perform- 

ance,” Res. Quart. dm. Phys. Educ. Assoc., Vol. I, No. 1 (March, 1930), 58-45. 

McCloy, Chas. H., Zhe Measurement of Athletic Power, p. 95. New York, 


A. S. Barnes and Company, 1932. ‘ " / Ме 
18Cozens, Frederick W., Achievement Scales in Physical Education Activities for 


College Men. Philadelphia, Lea and Febiger, 1956. З 
* Bok walter Кап W., Vn Assessment of the Validity of Height-Weight Class 


Division for High School Girls ” Res. Quart. Am. Assoc. for Health, Phus. Educ., 
and Rec., Vol. 15, No. 2 (May, 1944), 145-149. 


120 The Status of Measurement in Physical Education 


the preceding paragraph. A table showing the limits of each 
classification and based upon 19,000 cases is given. The plan tends 
to put too many girls in the tall groups and the medium medium 
group. The author also reports a not too satisfactory validity for 
this technique, but points out its advantages from the standpoint 
of its simplicity and the lack of other valid methods. 

А Comparison of Age-Height-Weight Classification Indices. 
In order to compare the various indices which have been brought 
forward to classify students below the college age level, it will be 
valuable to reduce them all to terms such that one factor is constant 
throughout. Since McCloy's index was the first scientific one in the 
field, the age factor in it, namely, 20, has been kept constant. The 
reduction of Reilly's presentation in table form to an index was 
made possible by taking a cross-section of his exponent values for 
each of the four factors considered. In this transposition, the factor 
of school grade was considered to be in the same category as age and 
was added to age. 


+ 1952 — Cozens (Junior Pentathlon)!? 20A +4.33H-+W 
. 1955 — Cozens, Trieb and Neilson (New 
California Classification for Sec- 
ondary Boys) 20A +4.75H+1.60W 


1. 1917 — Reilly (an approximation) 20A+1.50H+.95W 
2. 1922 — California Secondary Boys 20A 4-2.00H --1.375W 
$. 1927 — McCloy 20А 4-3.75H 4-2.50W. 
4. 1952 — McCloy 20А J-6H 4-W 

5. 1952 — Cozens and Neilson 20A+5.5H +-1.1W 

6. 

7. 


It is quite apparent that the classification scheme of Reilly and 
that for secondary boys in California (1922) were decidedly lacking 
in the emphasis given to the factor of height. The early study of 
McCloy overemphasized the influence of weight, probably due to 
his Chinese data. It would seem from a comparison of the elemen- 
tary and secondary formulas (4), (5), (6) with (7) that height has less 
influence upon performance and weight more influence with second- 
ary boys than with elementary. 

Other Indices for Use in Homogeneous Grouping. It is often 
desirable to use other factors than age, height and weight for homo- 
geneous grouping of students, Many of the general motor ability 
tests were originally designed for the purpose of classifying students, 
Other tests can be similarly used. The objectives of the activity 


17This classification index as actually used was: index =.77 age (in months) 4-2 


height (in inches) 4-.46 weight (in pounds). 


=. 


Indices for the Classification of Students 121 


concerned, and the administrative factors involved will determine 
the grouping technique used. Logically even skill tests can be used 
for classification. Thus, a swimming skill test can determine 
groupings for elementary, intermediate and advanced swimming 
classes. The concern,in this chapter, however, is with the over-all 
problem of general grouping. Other formulae have been advanced 
to serve purposes similar to those of the age-height-weight indices, 
that is, of providing a simple, effective means of quickly grouping 
large numbers of pupils for instruction or competition. 

Strength Indices as Classifiers. One of the uses to which Rogers! 
puts his strength index is in equalizing the abilities of competing 
individuals or teams by grouping together boys of nearly equal 
strength. Because of the positive relationship between strength and 
motor ability Stansbury, ! Carpenter,?? and Larson?! similarly 
suggest the use of their strength indices as a basis for classifying 
students. Bookwalter, et al., ?? recommend that Larson's Dynamic 
Strength Lest?’ be used as a primary classifier, and that the three 
groups thus obtained be again divided at the median score in 
McCloy’s Classification Index.?* 

Motor Ability Indices as Classifiers. Kistler 
validated plan by which grouping may be accomplished. From 


7525 studies present a 


18Rogers, Frederick Rand, Physical Capacity Tests in the Administration of Phys- 
ical Education. New York, Teachers College, Columbia University, Contribu- 
tions to Education, No. 173, 1925. 1 4 P З 

19Stansbury, Edgar, “А Simplified Method of Classifying Junior and Senior High 
School Boys into Homogeneous Groups for Physical Education Activities," 
Res. Quart, Am. Assoc. for Health, Phys. Educ., and Ree., Vol. 12, No. 4 (De- 
cember, 1941), 765-776. 


20Carpenter, Aileen, op. cit. Г " 
21Larson, Leonard A., “A Factor Analysis of Strength Variables and Tests with 


a Test Combination of Chinning, Dipping, and Vertical Jump," Rev. Quart. Am. 
Assoc. for Health, Phys. Educ., and Rec., Vol. XI, No. 4 (December, 1940), 82-96. 
22Bookwalter, Karl W., Ballin, Ralph and Bookwalter, Carolyn W., “A Simple, 
Economical, and Valid Administrative Ability-Grouping of High School Boys 
for Physical Education,” Res. Quart. Am. Assoc. for Health, Phys. Educ., ana 
Rec., Vol. 13, No. 4 (December, 1942), 512-519. 
2З] arson, Leonard A., op. cit. 


24 McCloy, Chas. H., op. сй. $ 
?5Kistler J. W., “A o RIO Study of Methods of Classifying Pupils into 


Homogeneous Groups for Physical Education," Res. Quart. Ат. Phys. Educ. 
Assoc., Vol. V, No. 1 (March, 1934), 42-48. € к А 
Kistler, J. W., "The Establishment of Bases for Classification of Junior and 
Senior High School Boys into Homogeneous Groups for Physical Education. 
Am. Assoc. for Health, Phys. Educ., and Rec., Vol. VIII, No. 4 (December, 1937), 
11-18. A 


122 The Status of Measurement in Physical Education 


thirty-one test items giving consideration to size, strength, educa- 
bility, specific skills and motor ability of a general nature, nineteen 


1. Grouping Index = Burpee Score — 4 Dodge Run. + 0.5 Shot-put + 0.016 
Classification Index. К =.954. 
2. Grouping Index —Standing Broad Jump+0.4 General Motor Capacity +0.03 
Strength Index 4-7 Shuttle Run. R=,924. 
. Grouping Index =Standing Broad Jump+6.5 Burpee--7 Shuttle Run+0.2 * 
Classification Index. К =.920. 


a 


The following statements concerning the use of the batteries are 
pertinent:26 


Battery No. 1 is recommended for use when efficient classifica- 
tion is the major consideration. 

Battery No. 2 will serve about as well as a classifying device 
and has the advantage of supplying a maximum amount of 
desirable information about the boy. 

Battery No. 3 is probably best suited for general use, This 


Larson's?? indoor and outdoor motor ability tests include devices 
for use with the indices for classification Purposes. These tests are 
described on page 131. 

Stansbury 28 Suggests three simplified formulae for classifying 
Junior and senior high school boys homogeneously for physica] 
education activities. 


Battery F 


Е їп ]bs.), 

х described on page 152. 

assifying Pupils into 
Ability Varia 

with Tests for College Men,” Res. Quart. Am, ds Jor Heath T bs DA 


soc. for H 
and Кес, Vol. 12, No. 3 (October, 1941), 499-517, "^ Неа, Phys. yet 
28Stansbury, Edgar, ор. cit, 


Inaices for the Classification of Students 123 


A power quotient is obtained by comparing ‘battery norms to 
McCloy’s Classification Index norms. Tables for this purpose are 
available. 

The Computation of Classification Indices or Formulas. 
The computation of a classification index using the factors of age, 
height and weight involves the solution of a four-variable problem 
in partial and multiple correlation techniques. So that the student 
may be familiar with the techniques used, a sample problem is 
offered on pages 288-290. 


Selected References 


Booxwatter, Kart W.: “A Critical Evaluation of Some of the Existing Means 
of Classifying Boys for Physical Education," Res. Quart. Am. Assoc. for Health, 
Phys. Educ., and Rec., Vol. X, No. 3 (October, 1939), 119-127. 

Analyzes the Classification Index, Force Index, General Athletic Ability, 
Individual Athletic Events, Motor Ability, Physical Capacity Index, Strength 
Index, and Physical Fitness Index relative to their comparable merits as classi- 
fiers. Also discusses principles of classifying students and describes a primary 
and secondary classification plan. 

Booxwatter, Kart W., BALLIN, RarPH and BOOKWALTER, CAROLYN W.: “А 
Simple, Economical, and Valid Administrative Ability-Grouping of High School 
Boys for Physical Education," Res. Quart. Am. Assoc. for Health, Phys. Educ., 
and Rec., Vol. 15, No. 4 (December, 1942), 512-519 

A further study exploring the use of two disparate classifiers, Larson's Dy- 
namic Strength Test and McCloy’s Classification Index, as primary and secon- 
dary classifiers. 

Cozens, FREDERICK W.: Achievement Scales in Physical Education Activities for 
College Men, Chapter I, “Classification of College Men,” pp. 7-12. Philadelphia, 
Lea and Febiger, 1956. Pp. 118. 

The grouping of college men into nine height-weight divisions is discussed and 
a chart for such grouping is presented. 

Cozens, FREDERICK W. and NEiLsox, N. P.: “Age, Height and Weight as Factors 
in the Classification of Elementary School Children," Jr. Health and Phys. 
Educ., Vol. 5, No. 10 (December, 1932), 21, 58. 

"The classification index computed here closely resembles McCloy’s index and 
gives further evidence of its validity. 

Cozens, FREDERICK W., Tries, MARTIN Н. and Neison, N. P.: Physical Educa- 
tion Achievements Scales for Boys in Secondary Schools, Chapter II, “The Classi- 
fication of Secondary School Boys," pp. 10-15. New York, A. S. Barnes and 
Company, 1956. Pp. vi and 155. 

The best classification device fot secondary school boys puts less emphasis on 
height and more on weight. This shift in emphasis is undoubtedly rather closely 
connected with (1) the higher age range of the secondary school group and 
(2) the types of activities used by such a group. " 

McCrox, C. H.: The Measurement of Athletic Power, Chapter V, “Athletic Classi- 
fication and Handicapping by Age, Height and Weight,” pp- 63-95. New York, 
A. S. Barnes and Company, 1952. Pp. xiv and 178. В 

In this chapter is developed the technique by which McCloy establishes the 
validity of the Classification Indices (20A+6H+W) and (6H+W)- 


CHAPTER VII 
The Measurement of General 


Qualities — Strength and Power 


The next two chapters include a number of tests used to measure 
certain general qualities which underlie performance in physical 
activities. The groupings include tests of strength; power; motor 
ability, capacity and educability; and neuromuscular contro], [n 
all probability no one of these tests of general qualities measures the 
entire amount of the quality under consideration, but each has 
certain values for diagnosing physical performance and for predicting 
undeveloped abilities. ў 


Strength Tests 


a great deal of the intercollegiate competition of these early eighties 
was in the form of gymnastic teams and feats of strength, it is quite 
natural that emphasis in the mind of student as well as instructor 
should be on muscular strength. 

Sargent's Test. Sargent’s strength test took int 
only strength but other attributes of physical well-being such 
good circulation, vigorous heart and lungs, and well develope dr y 
well controlled central nervous System. Inasmuch as few icd 

124 


9 account not 


Strength and Power 125 


have had experience in the elaborate routines of strength-testing as 
formerly practiced, it might be interesting to review some of the 
principal components of such an examination. 

The strength of back and legs is measured by a spring dynamom- 
eter. The individual stands on a small platform and then pulls 
on the handles of the dynamometer which is securely fastened to the 
platform. The dial on the dynamometer indicates the pull in kilo- 
grams. Leg strength is measured in a similar fashion except that 
the individual may sit and put the handles of the dynamometer on 
his two thighs. The dial again indicates his ability to push up with 
his legs. 

'The grasping power of the right and left forearm is measured by 
means of a hand dynamometer of the type which is used today. 
Here again the result is recorded in kilograms. 

The capacity of the lungs has been previously tested by means 
of a wet spirometer and the result recorded in cubic centimeters. 
Also, the propulsive effort of the expiratory muscles is measured by a 
manometer registering a capacity in terms of hectograms (100 gm.). 
An alternate test was later used in which the strength of the chest 
muscles was tested by means of a chest dynamometer which meas- 
ured the push of the hands on the knobs of this dynamometer held 
in front of the chest. When the intercollegiate strength test was 
adopted in 1897 by fifteen colleges and universities, one-twentieth 
of the lung capacity might be substituted for lung strength. ! 

Strength of the upper arms (triceps) and chest was measured by 
dipping on the parallel bars. The bars were set at a height of 5 teet 
and were 18 inches apart (inside measurement). On the dip the 
chin should touch a cord suspended 3 inches above the level of the 
top of the bars. Dipping was followed almost immediately by 
chinning to give the element of endurance. Chinning measured the 
strength of the upper arms (biceps) and back. The horizontal bar 
used was from 1 to 134 inches in diameter and was suspended 8 feet 
from the floor. Either the ordinary or reverse grasp could be used. 
At first this pull-up was accomplished on the rings. The number of 
dips and pull-ups was multiplied by Mo of the weight to reduce the 
number of figures entered on the chart and also to take account of 
the weight which each individual must lift. The total score was 
indicated by the addition of each of the records. 

1Sargent, Dudley A., “Intercollegiate Strength Tests,” dm. Phys. Educ. Rev., 
II (December, 1897), 216. 
6 


126 The Status of Measurement in Physical Education 


Francis Galton’s Test.? In connection with this strength test 
idea, Francis Galton in 1890 attempted to find a trustworthy method 
of measuring physical efficiency from the standpoint of usefulness 
in civil service and in business. His test included: 


1. Breathing capacity. 

2. Strength tests with reference to stature and weight. 
3. Quickness to response of a signal by the eye or ear. 
4. Eye-sight. 

5. Hearing. 

6. Color sense. 


“This principle,” says Meylan, “had wide application in England 
and the United States.” It is interesting to note that Galton recog- 
nized the inadequacy of strength measurements to give a true 
picture of efficiency, and he introduced elements of coordination 
and accuracy which appear again only in some of the most recent 
attempts to define physical ability. 

The Ergograph. The ergograph, originally developed in 1884 
by Mosso, has been modified repeatedly to conform to the problem 
under investigation, though without changing the principle. It 
consists of a support for the reception of the forearm and a weight 
suspended from a pulley wheel which is connected by means of a 
thread with the index or middle finger of the subject, and is equipped 
with a lever adjusted to write upon the smoked paper of a kymo- 
graph. “The character of the successive muscular contractions 
obtained by flexing and relaxing the finger was employed as an index 
of estimating the muscular power and general condition of the 
person.” 3 

Kellogg's Dynamometer.*  Kellogg's interest lay primarily in 
exercise as a therapeutic measure. It was his desire to find an instru- 
ment which would accurately measure the strength of various muscle 
groups. His work with various dynamometers finally led to the 
invention of an instrument which he called the Universal Dynamom- 


2 Meylan, С. L., “Marks for Physical Efficiency," dm. Phys. Educ. Rev., X 
(June, 1905), 106. 


3Burton-Opitz, R., “Tests of Physical Efficiency," dm. Phys. Educ. Rev., XXVII 
(April 1922), 155. 


4Kellogg, J. H., “The Value of Strength Tests in the Prescription of Exercise,” 
Modern Medicine Library, Il, 1896. 


Strength and Power 127 


eter and with which he was able to test the strength of a large 
variety of muscle groups. These include: 


Nine muscle groups of each of the upper extremities. 
Eight muscle groups of each of the lower extremities. 
Four muscle groups of the trunk. 

Four muscle groups of the neck. 


His work includes a comparison of the strength of men and women, 
and a distribution of muscular strength in the two sexes. 

The principle of Kellogg’s dynamometer consists in applying 
pressure of a piston by which a column of mercury is raised, the 
height of the column determining the degree of pressure. One of the 
serious disadvantages of this dynamometer is that it is a stationary 
piece of apparatus and cannot be moved about freely, and is beyond 
the resources of most small colleges. 

Martin's “Resistance Test." The next important step made 
in the development of strength tests was Martin's? development ot 
the ' resistance strength test." In 1915, while making an intensive 
study of the after-effects of the 1914 epidemic of infantile paralysis 
which swept the state of Vermont, he found need for a form of dyna- 
mometer with which to make comparisons between impaired and 
unimpaired muscles. An ordinary flat-faced spring balance was 
selected because it could be readily adjusted to meet conditions of 
proper alignment and pull. The principle of resistance was also 
introduced—" resistance to а pull rather than the exertion of strength 
in an active effort." The test was made applicable to eleven muscle 
groups on each of the upper extremities and ten on each of the 
lower, but this test consumed entirely too much time and soon gave 
way to a "short test" in which the strength of four pairs of muscle 
groups was measured, namely, pectorals, forearm flexors, thigh 
adductors and thigh abductors. The strength of these four groups 
was shown to correlate highly with the total strength. In order to 
obtain the total strength, the result of the short test is multiplied by 
6.67. Martin also showed that with children from five to eighteen, 
the strength of the extensors and flexors of forearm and wrist 
correlated highly with the entire strength, the correlation being 91 
with a P.E. of .0168. With children the ratio of strength to weight 
seems to be fairly constant, 20 for males and 18 for females. This 


5 Martin, E. G., “Tests of Muscular Efficiency,” Physiol. Rev., I July, 1921), 454 


128 The Status of Measurement in Physical Education 


relationship does not show the tendency to be constant in adult 
males, though there is a moderate correlation and the ratio of 
strength to weight may be said to apply within reasonable limits to 
63 per cent of the cases examined. A further conclusion is pointed 
out, namely, that the factors influencing the strength-weight ratio 
show that a high ratio signifies good muscle quality and good 
innervation. 

Not least among the important contributions of Martin was the 
establishment of the general principle that the strength of a few 
muscles is a good indication of the strength of the body as a whole. 
This principle is to be found in all succeeding experimental work and 
is taken for granted in most of our present-day testing. 

Rogers’ Strength Index. In order to solve the administrative 
problem in connection with grouping boys for purposes of team 
competition within the school and within the class period, Rogers 
has worked out a classification on the basis of a total score obtained 
in a strength test. His program of testing for the high school boy 
has taken account of the establishment of criteria of validity and he 
has further demonstrated the reliability of the tests. 


The strength index, which is a score indicating the strength of the 
large voluntary muscles of the body, together with lung capacity, 
conforms to all the criteria of a useful measure of general athletic 
ability. First, and most important, it is a highly valid measure, 
being over two and one-half times as accurate as weight, and nearly 
twice as accurate as the best possible combination 'of age, height 
and weight. It is economical of time; boys can be given all tests at 
the rate of one boy per minute, using only a single adult tester with 
a few student assistants, and indices can be computed in fifteen to 
twenty seconds each. Boys are fascinated by the testing procedure, 
often asking for retests. Thus their whole-hearted cooperation is 
secured without difficulty. The tests are well adapted to the age 
and sex of the subjects; they are easily and accurately scored in 
mathematical terms; they are objective, and the strength index is 
probably more reliable than any highly valid composite mental test 
available. 7 1 

The strength index score is obtained by adding the following items: 

1. Number of cubic inches in lung capacity. 


2. Number of pounds pressure in right grip. 


Rogers, Frederick Rand, Physical Capacity Tests in the Administration of 


Physical Education. New York, Teachers College, Columbia University, Con- 
tributions to Education No. 173, 1925. З 


Ibid., p. 18. 


Strength and Power 129 


[Zi 


. Number of pounds pressure in left grip. 
(For items 2 and 5 an oval hand dynamometer is used.) 
. Number of pounds lifted, using back. 
. Number of pounds lifted, using legs (for items 4 and 5 a back 
and legs dynamometer is used). 
. Strength of arms, calculated thus: 


O 


a 


(Pull-ups + push-ups) X [ хе + (height—60) |; 


A Physical Fitness Index is derived from the Strength Index by 
dividing the SI by age and weight norms and multiplying by 100.5 


Rogers' Short Strength Index. This index is for the use of the 
teacher who does not have available all the apparatus for the com- 
plete test. Says Rogers, "This short method is inadequate as a 
measure of endurance, nor can physical fitness indices be calculated 
from it; but as a quick and cheap method of determining general 
athletic ability, without these, it is almost as accurate as the com- 
plete test." 9 

The scoring for the short strength index is as follows: 


1. Three times sum of right grip plus left grip in pounds added to 
2. (Pull-ups + push-ups) X E T (height —60) |. 


Rogers has worked out tables of strength norms for various age 
and weight measurements and uses these to show the boys how far 
they come from at least certain standards. 

McCloy's Method of Scoring Chinning and Dipping. The 
formula used by Rogers!? to compute arm strength, namely 


(Push-up 4- pull-ups) [se + (height — 60) | 


has been found by McCloy + to give spuriously low or high results 
in the upper levels of strength. His experimental procedure involved 
the securing of data on total pull-up strength by means of a spring 


3See page 182 for a discussion of the PFI. 
Rogers, Frederick Rand, op. cit, р. 155. ‹ A 
10Rogers, Frederick Rand, Tests and Measurements Programs in the Redirection of 
Physical Education, p. 58. New York, Bureau of Publications, Teachers College, 
Columbia University, 1927. 1 x ee C 
11McCloy, C. H., “А New Method of Scoring Chinning and Dipping, Res. Quart. 
Am. Phys. Educ. Assoc., Vol. II, No. 4 (December, 1931), 152-145. 


130 The Status of Measurement in Physical Education 


dynamometer as well as total number of pull-ups possible. From 
these data it was possible to compute two formulae for estimating 
total strength, knowing number of chins made and the weight of the 
кыдан TS=1.77W+3.42C —46 

Т5 =1.27Соалзүү 


where TS = total strength, C = number of chins and W = we:ght. 
McCloy !? indicates that: 


Either formula gives with relative accuracy the strength of the 
individual which correlates very highly with his total strength and 
is more significant than total strength as a predictor of athletic 
ability. . . . Chinning and dipping, or chinning alone, scored in 
this way can be used as a classifying device which seems to be as 
adequate on the whole as the total strength test. 


Tables are given which fac 
these formulae. 


Using this method of scoring arm strength and combining it with 
the other elements in the Rogers' Strength Test, McCloy +° has 
found in five groups of studies, using four different criteria, that arm 
strength is of great importance in a wide variety of types of athletics. 
The groups studied varied in age range, but all were below the 
college age level, and consisted principally of junior and senior high 
school boys. 

One implication of practical value may be drawn from this study, 
namely, that our Programs in physical education must provide an 
opportunity for activities designed for the development of arm and 
shoulder-girdle strength. 


Prediction of Total Strength. Wendler!4 has shown “that 
the muscle groups measured by testin, 
ployed at the present time are not of e 
some of the measures are of little 
strength.” The list of muscle group 


12 Tbid., p. 139. 


13McCloy, C. H., “The Apparent Importance of Arm Strength in Athletics,” 
Res. Quart. dm. Phys. Educ. Assoc., Vol. V, No. 1 (March, 1934), 3-11. 

See also: McCloy, C. H., Tests and Measurements in Health and Physical Education, 
p. 22-24. F. S. Crofts and Company, 1942. 

14 Wendler, Arthur J., “An Analytical Study of Strength Tests Using the Universal 


Dynamometer," Suppl. to Res. Quart. Ат. Phys. Educ. Assoc., Vol. VI, No. 5 
(October, 1935), 81-85. 


ilitate rapid calculation by each of 


& procedures commonly em- 
qual predictive value and that 
importance for predicting total 
s "whose strength when properly 


Strength and Power 131 


combined gives the best prediction of total strength of men" is as 
follows: 


*]. Thigh extensors. *4. Arm flexors. 
*2. Leg extensors. 5. Anterior trunk. 
*5. Pectoralis major. 6. Foot extensors. 


Multiple correlation with total strength, 6 groups, R —.956. 
Multiple correlation with total strength, 4 groups starred (*), К —.953. 


The battery of weighted strength tests for women which gives 
the highest multiple correlation with total strength (R = .958) is: 


Weight Weight 
1. Thigh flexors....... 7 4. Pectoralis major.... 7 
2. Thigh extensors..... 5 5. Deltoids «e Lk 
5. Leg extensors....... 5 6. Hand flexors Т 


Multiple Strength Indices of General Motor Ability. 
Dunder!5 has found that back and leg strength, when used in 
combination with other strength factors, contributes very little to 
the relationship between a strength index and a motor ability test 
composed of a large number of individual items (25 in this instance). 
For all practical purposes, with high school boys, a combination of 
weight, chins and grip strength offers a strength index of considerable 
value. 

This is in line with the previous findings of Rump!? that chin and 
dip strength are very important in the determination of total 
strength but that back and leg strength add little of value to the 
relationship with general athletic ability. 

Larson’s!7 factor analysis study on strength items revealed two 
components defined as dynamic strength, the ability to raise the 
body weight and propel it upward; and static dynamometrical 
strength, that strength registered by a dynamometer and indicating 
ability to lift, pull, push or squeeze. Dynamic strength was found 
to have almost three times the predictive power of static dynamo- 
metrical strength in relation to motor ability. The proposed strength 
test, utilizing elements of dynamic strength, is a three item battery 
including dips, chins and the vertical jump. An index score is 
15Dunder, Victor C., “A Multiple Strength Index of General Motor Ability,” 

Res. Quart. Ат. Phys. Educ. Assoc., Vol. IV, No. 3 (October, 1933), 152-142. 
ion of Arm, Back, Abdomen and Leg Strength 


16Rump, A. H., The Relative Contributi / ; ] 
to the General Athletic Ability of High School Boys. Unpublished master’s thesis, 
University of Iowa, 1931. i 

ааваа. “А Factor Analysis of Strength Variables and Tests with a 
Test Combination of Chinning, Dipping and Vertical Jump," Res. Quart. Am. 
Assoc. for Health, Phys. Educ., and Rec., Vol. XI, No. 4 (December, 1940), 82-96. 


132 The Status of Measurement in Physical Education 


computed by changing the raw scores on the three items into 
weighted standard scores. Scoring tables and a classification chart 
for men seventeen to twenty-four years of age appear in the listed 
source. This test is referred to elsewhere in this text and in the 
literature of the field as Larson’s Dynamic Strength Test and the 
Springfield Strength Test. 


Weighted Strength Tests for High School Girls. In an 
experimental study with 500 Des Moines, Iowa, high school girls, 
Anderson! concludes that ‘‘neither total strength nor the Physical 
Fitness Index is a very valid predictor of the athletic ability ot the 
girls tested. It does not at all demonstrate that strength is not a 
very valuable element in the total motor ability of high school girls. 
In view, however, of the widespread reliance on strength tests as 
predictors of such abilities for the purpose of classifications of girls 
for gymnasium activities, it would seem that there should be less 
confidence placed in the efficacy of strength tests for this purpose, 
and a continued endeavor to make further contributions to this 
problem with other tests, at least until more experimental evidence 
has been adduced to Support the validity of strength tests for this 
purpose." А 

А short battery of weighted strength tests formulated to predict 


athletic ability but showing a rather low multiple correlation co- 
efficient (R — :489) is suggested: 


Weighted Strength Index =5 (thigh flexors) +7 (push) +1 (leg lift). 


Standards of performance are also shown. 
Weighted Strength Te 
attempt to find a simple, 


diction of strength of hi 
Strength Index based on 


S.I. —1.4 (8 lb. 


sts for High School Boys. In the 
non-time consuming battery for the pre- 


gh school boys, Stansbury! proposes a 
the following formula: 


shot-put in feet) -+ (standing broad jump in inches) + (weight 

in pounds), 

18 Anderson, Theresa W., “Weighted Strength Tests for the Prediction of Athletic 
Ability in High School Girls,” Res. Quart. Am. Phys. Educ. Assoc., Vol. VII, 
No. 1 (March, 1936), 136-142. 

19Stansbury, Edgar, “A Simplified Method of Classifying Junior and Senior Boys 
Into Homogeneous Groups for Physical Education Activities," Res. Quart. Am. 

Assoc. for Health, Phys. Educ., and Rec., Vol. 12, No. 4 (December, 1941), 

765-776. 


Strength and Power 133 


Norm tables are available. The author proposes a Physical Effi- 
ciency Index, which is computed by multiplying the Strength Index 
score by 100, and dividing by the norm score. А multiple г of .845 
was found between this battery and Rogers' Strength Index. 

Weighted Strength Tests for Elementary School Children. 
Carpenter?? proposes similar formulae for use with boys and girls 
of the first three grades: 


Boys’ strength: .1 broad jump 4-2.5 shot-put (4 Ibs.) + weight. 
Girls’ strength: .5 broad jump 4-3 shot-put + weight. 


Physical Efficiency Indices are computed by the same method as 
the Stansbury Index.?! This test is considered of value because of 
the difficulty of measuring back and leg lift strength in young 
children by standard methods utilizing a dynamometer. Multiple 
R’s of .63 for boys and .497 for girls were found when a “total girdle 
strength" criterion was used The lack of back and leg strength in 
the criterion, and the importance of these factors in the activities 
used is thought to be partly responsible for the relatively low 
correlations. 

Weighted Strength Tests for College Women. Proceeding 
on McCloy’s22 theory that arm strength is as accurate a predictor 
of motor ability as is total strength, Wilson?? devised some simple 
test batteries for use when special strength testing machinery or 
devices were lacking. Rogers’ Short Index? was used as the valida- 
ting criterion. Performance items which correlated highly with the 
criterion were selected. Computed by means of the regression 
equation, the following formulae were derived: 

(1) .5 Pull-up--1. Vertical Pull. 

(2) 2.9 Pull-up+1. Sum of Push and Pull. 
(3) 1. Pull-up+.5 Basketball Throw. 

(4) .9 Vertical Pull-+-1. Push-up (bench). 


(5) 3.8 Pull-up+1. Weight Holding. 
(6) .7 Vertical Pull+1. Push-up (knees). 


“Strength Testing in the First Three Grades,” Res. Quart 


200, ter, Aileen, 
Pept beh Ith, Phys. Edac., and Rec., Vol. 15, No. 3 (October, 1942), 


Ат. Assoc. for Hea 
328-532. 


21Stansbury, Edgar, op. cit. Е А ali s . 
22McCloy, C. H. “A New Method of Scoring Chinning and Dipping,” op. cit. 


23Wilson, Marjorie, “A Study of Arm and Shoulder-Girdle Strength of College 
Wonen in Selected IX КЕ Quart. Am. Assoc. for Health, Phys. Educ., and 


Rec., Vol. 15, No. 3 (October, 1944), 258-267. 7 URN 
?^Rogers, Frederick Rand, Physical Capacity Tests in the Administration of 


Physical Education, op. cit. 


134 The Status of Measurement in Physical Education 


The validity coefficients range from .785 to .865 for the six 
batteries. The data for the test were gathered from physical 
education majors as subjects. 

Strength and General Athletic Ability for College Men. 
Cozens?? collected data on ten strength tests and a number of com- 
binations using unselected college men. These data were analyzed 
in relation to: (1) Their usefulness in predicting general athletic 
ability using as a criterion Cozens' General Athletic Ability Test; 
(2) a possible substitution for dips in the General Athletic Ability 
Test; and (3) the formulation of a short battery of tests for meas- 
uring strength. The following conclusions were drawn: 


1. For rough estimations, the index of Chins + Dips + Height 
is probably the most useful because of the factor of speed in admin- 
istration, (r —.645) 

2. The only substitute for dips in the battery of general athletic 
ability tests which increased the multiple correlation coefficient was 
McCloy’s Arm Strength Index. This increase is relatively small 
from R=.972 to R=.974. Though McCloy’s Index has a much 
lower correlation with the criterion of (r—.455) than Rogers’ 
Strength Index (r=.613), the intercorrelations with the other items 
in the battery are also much lower, resulting in an increase in the 
multiple relationship. However, the increase is not sufficient to 
justify the additional computations necessary. As an element of 
arm strength in a general athletic ability test for college men, dips 
on the parallel bars is the most useful single measure. 

3. The factors of age and weight, so prominent in Rogers’ Strength 
Index for high school boys, have no significance in a strength index 
designed to predict general athletic ability in college men. The fac- 
tor of height, however, becomes rather prominent in an index for 
college men. Rogers found the reverse to be true with high school 

oys. 

4. Great differences exist between hi 
men insofar as the relationships amon 
and weight are concerned. Th, 
between these factors and strength test items. In view of these 
facts, secondary school perfo 
basis for extending strength 


5. Using ten strength test items as a criterion, a short battery of 
strength tests was formulated yielding an R of .982. This battery 


25Cozens, Frederick W., “Strength Tests as Measures of General Athletic Ability 


in College Men,” Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., 
Vol. XI, No. 1 (March, 1940), 45-52. 


Strength and Power 135 


consists of five tests properly weighted: back lift, leg lift, arm push, 
chins, and dips. The regression equation is: 


Xo (estimated strength) —1.533 Back Lift+1.052 Leg Lift--.938 
Arm Push--10.9 Chins+14.94 Dips+405.6. 


Recent Strength Testing Research. Much of the more 
recent research in the field of strength tests has been directed toward 
standardizing strength test items, exploring relationships between 
accepted strength tests and other factors, and developing predictive 
indices rather than toward the building of new strength tests. 
Some of the recent researches, not previously mentioned in this 
chapter, are briefly summarized. 

Assuming that the angle at which the leg lift in the Rogers’ Phys- 
ical Capacity Test is made would be of importance in the amount of 
lift achieved, Carpenter?9 conducted studies from which she 
concluded that the maximum in leg lift would be obtained when the 
lift is made with the legs and thighs making an angle at the knees 
of between 115 degrees and 124 degrees. 

Everts and Hathaway 2” describe an improved method of meas- 
uring leg lift in the Rogers Physical Capacity Test. The use of a 
simple belt technique increases the accuracy of measurement, affords 
greater safety, is less fatiguing for use with women, and pupils may 
be tested in street attire. 

Observing that some individuals who performed above average 
on tests of arm strength, endurance and cardiovascular condition 
were below average in performance on sit-up tests, DeWitt?8 inves- 
tigated the validity of sit-ups as a measure of abdominal strength 
and endurance. His criterion for determining abdominal strength 
was to measure in pounds the power of the abdominal muscles in 
raising the trunk from a back rest position. A belt under the arm 
pits attached to a dynamometer was used. Endurance was measured 
by length of time the subject could maintain a position with feet 


“A Study of Angles in the Measurement of the Leg Lift," 


26Ca Р 
wy лн and Rec., Vol. IX, No. 3 (October, 


Res. Quart. Am. Assoc. for Health, Phys. Educ., 
1958), 70-72. М 
?7Everts, Edgar W. and Hathaway, Gordon J., “The Use of a Belt to Measure 
Leg Strength Improves the Administration of Physical Fitness Tests,” Rer. 
Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. IX, No. 3 (October, 


ЗЫК Bo “А Study of the Sit-Up Type Test as a Means of Measuring 
hand E: he Abdominal Muscles,” Res. Quart. dm. Assoc. for 


St th and Endurance of t 1 
Health, jm ‘Educ, айй Rec., Vol. 15, No. 1 (March, 1944), 60-65. 


136 The Status of Measurement in Physical Education 


clasped under stall bars and trunk raised just clear of the floor. | 


Three types of sit-up tests were administered, and DeWitt concluded 
that sit-ups were not a valid measure of strength and endurance of 
the abdominal muscles, and that heavier and taller men are handi- 
capped in this type test. 

Wedemeyer?? using two-minute continuous sit-ups and the 
Martin-Breaking Method as criteria, concluded that sit-ups do 
measure a combination of endurance and strength of the abdominal 
and thigh-flexor muscles, but that there appears to be little relation- 
ship between strength and weight. Conflict in these two studies 
probably lies in the fact that DeWitt used a static strength criterion 
for a dynamic strength item, and Larson?? has shown that there is 
a low correlation between these two types of strength. 

Karpovich?! investigated the validity of substituting the limited 
time sit-ups for the unlimited time sit-up. Since the correlation was 
but .60, he concluded the relationship did not justify substitution. 

Havlicek,*? in working with Armed Force sit-up tests, observed 
that the 100 percentile score did not challenge the limits of well- 
conditioned men. He experimented to find the best time limit to 
place on speed sit-ups and concluded that the three-minute limit 
was the most accurate measure of abdominal strength. He gives 
T-Scores for one, two, three and five-minute sit-ups. 

Karpovich, Weiss and Elbel?? report a correlation of but .38 
between leg-lifts and sit-ups, indicating that these two items should 
not be used interchangeably in test batteries. The low correlation 
was thought to be due to the fact that a larger number of muscles 
are involved in the more complex movements of the sit-ups. 

Karpovich?* found that the forward grip and reverse grip could 


29 Wedemeyer, Ross, “A Differential Analysis of Sit-Ups for Strength and 


Muscular Endurance," Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., 
Vol. 17, No. 1 (March, 1946), 40-47. 


30Larson, Leonard А., op. cit. 

31Karpovich, Peter V, “Studies of the AAF Physical Fitness Test: Selection of 
a Time Limit for Sit-Ups," Project No.,245, Report No. 3, AAF School of 
Aviation Medicine, Randolph Field, Texas, July 12, 1944, 

32Havlicek, Frank J., “Speed Sit-ups,” Res. Quart. for Health, Phys. Educ., and 
Rec., Vol. 15, No. 1 (March, 1944), 75-77. 

33Karpovich, Peter V., Weiss, Raymond A. and Elbel, Edwin R., "Relation 


Between Leg-Lift and Sit-Up," Res. Quart. Am. Assoc. Sor Health, Phys. Educ., 
and Rec., Vol. 17, No. 1 (March, 1946), 21-24. 


34Karpovich, Peter V., “The Effect of Reverse and Forward Grips Upon Perfor- 


mance in Chinning," Project No. 178, Report No. 1, AAF School of Aviation 
Medicine, Randolph Field, Texas, October, 1943. 


Strength and Power - 137 


not be used interchangeably on chin-up tests. On the average one 
can perform 2.4 more chins with the reverse grip, and the difference 
increases as the number of chins increases. DeWitt?? substantiates 
his findings, and shows also that the use of a kip or kick will also 
slightly increase the number of chins. He emphasizes the need tor 
specifying exact procedures in chinning tests. 

Jones39 found that strength seems to be more closely related to 
physiological age than to chronological age in as much as post- 
menarcheal girls of the same age as pre-menarcheal girls were the 
stronger. 

Cureton37 showed that the mesomorphic group possessed greater 
contracted muscle girths, and achieved higher scores on the Rogers 
Strength Index, in a study of college men classified according to the 
Sheldon technique. 

Carpenter?$ found chinning and dipping not a satisfactory arm 
strength test for women and suggests the substitution of a push and 
pull test utilizing an attachment to the grip dynamometer. These 
scores can be converted to chin and dip equivalents and when 
combined in the total battery the McCloy norms can be used. She 
further concluded that strength and power are two of the most 
important factors in the athletic ability of women, but that femi- 


ninity as measured was a negligible factor. ° 


Power Tests 


Power appears to be a basic component of achievement in athletic 
skills. It is a mechanical principle involving force times velocily, 
and is important in those athletic events which involve the projec- 
tion of the body or any other implement through space and time. 


35DeWitt, R. T., "A Comparative Study of Three Types of Chinning Tests," 
Res. Ош Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 15, No. 3 (October, 


1944), 240-248. 

36Jones, Harold E., “The Sexual М. 
Strength," Res. Quart. Am. Assoc. for Health, 
No. 2 (May, 1947), 135-145. А . 

“А mod K., Jr., Physical Fitness Appraisal and Guidance, p- 367. St. 
Loui | . V. Mosby Company, 1947. S. : 

e OR uude ЖА Critical Study of the Factors Determining Effective 
Strength Tests for Women,” Res. Quart. Am. Assoc. for Health, Phys. Educ., and 
Rec., Vol. " A (December, 1958), 3-92... , 

Tos Kins sen Power and *Femininity' as Factors Influencing 
the Athletic Performance of College Women," Res. Quart. Am. Assoc. for 
Health, Phys. Educ., and Rec», Vol. IX, No. 2 (May, 1938), 98-120. 


aturing of Girls as Related to Growth in 
Phys. Educ., and Rec., Vol. 18, 


138 The Status of Measurement in Physical Education 


The simplest available test of power is the vertical jump, the devel- 
opment and use of which is described following. Studies have 
supported the contention that the standing broad jump may be 
considered as adequate a test of power as the vertical jump.*° It 
will be noted that in several of the recently developed motor per- 
formance tests the broad jump serves to measure this quality. 
Limitations on its use lie in the lack of established standards, and 
the effect of learning the rather complex skills involved. 

The Vertical Jump. The Physical Test of a Man.*! Sargent 
points out that height and weight are usually thought of in connec- 
tion with the power or strength of an individual but goes on to say 
that these measurements really do not give a correct idea of “inner- 
vation of the parts, upon which power and efficiency so frequently 
depend.” The test which he proposed consists of using the constant 
factors of height and weight together with an individual’s ability 
to overcome the constant force of gravity in the following manner: 


гра Я __ (Weight in pounds) X (Height jumped in inches) 
Efficiency index = Stature height in inches 


The jump should be made straight into the air with the head 
touching a cardboard disk placed at the highest point above the 
head that can be just touched in jumping. In commenting upon its 
possible value Sargent says, "I think, therefore, that the test as a 
whole may be considered as a momentary try-out of one's strength, 
speed, energy and dexterity combined, which, in my opinion, fur- 
nishes a fair physical test of a man, and solves in a simple way his 
unknown equation as determined potentially by his height and 
weight.” It was discovered by later experimentation?? that the 
jump was independent of both height and weight and hence that 
these factors could be discarded. 

Schwegler and Fnglehardt Variation of the Sargent Test.33. Schwegler 


and Englehardt of the University of Kansas report an interesting 
variation of this test in which the subj 


ation ect jumps as many times as 
possible in fifteen seconds, 


staying within a 2-foot ring while jumping. 

40McCloy, C. H., Tests and Measurements in Health and Physical Education, p. 65. 

41Sargent, Dudley A., “The Physical Test of a Man,” dm. Phys. Educ. Rev., 
XXVI (April, 1921), 188-194. 

42Sargent, L. W., “Some Observations on the Sar 
Efficiency,” Am. Phys. Educ. Rev., XXIX (February, 1924), 47-56. 


43Schwegler, R. A. and Englehardt, J. L., “A Test of Physical Efficiency," Am. 
Phys. Educ. Rev., XXIX (November, 1924), 501-505. 


gent Test of Neuromuscular 


Strength and Power К 139 


An apparatus automatically records the heights of the successive 
jumps on a paper record sheet. These men have found that the most 
satisfactory formulae for computing the physical index are, (2) for 
college men; 


Index _ The sum of the jumps during 15 seconds X A/ weight 
Height 


and (b) for junior high school boys. 


Sum of jumps during 15 seconds X У weight 
Index = - 
Age X height 


The "Leap-meler."** In this study the authors used a piece of 
apparatus similar to the one by Schwegler and Englehardt though 
of their own design. The subject to be tested is fitted with a cap to 
which is fastened a cord operating a lever arm. This in turn works . 
a pen guide holding a pen moving on graph paper and the height ot 
the jumps is recorded in reduced size. These records may be accu- 
rately measured; Из of an inch equals 1 inch of height jumped and 
may be interpolated to the nearest J4 inch. The object of the experi- 
ment was to investigate the possibilities of the Sargent test (best 
single jump) and the Schwegler-Englehardt test (sum of jumps in 
fifteen seconds) as measures of general athletic ability. The criterion 
selected for general athletic ability consisted of a group of four 
athletic ability tests (dash, high jump, rope climb and bar vault) 
used at the University of Oregon. The authors found that the 
Schwegler-Englehardt test was of no value in predicting general 
athletic ability as indicated by the criterion but that the Sargent test 
offers some possibility for use in measurement of this kind. 

Further Studies on the Sargent Jump Test.45 By offering adequate 
practice in the technique of jumping and selecting the best jump 
from two series of three jumps each, McCloy +6 finds that Ње reli- 
44 9 F ick W., The “Leap-meter,” An Investi ation 

[^ ara, Je Бы and C uen a und p General Athletic Ability. 


Eugene, University of Oregon Press, 1928. à ў 

*5Gerrish has perfected а “force-meter” (a type of spring scale) which offers an 

accurate means of determining the variable force and power of the standing 
vertical jump. By plotting certain curves for any individual, the force exerted, 
the height and velocity acquired, and the power developed at any moment of 
the jump can be noted. See Paul Herbert Gerrish, 4 Dynamic Analysis of the 
Standing Vertical Jump. Doctorate ш сыны gue College, Columbia 
University, 1954. Published privately by the author. — 

40McCloy, ©. H., “Recent Studies in the Sargent Jump,” Res. Quart. Am. Phys- 
Educ. Assoc., Vol. IIL No. 2 (May, 1932), 255-242. 


140 The Status of Measurement in Physical Education 


ability coefficient of the Sargent Jump can be raised to a point 
sufficiently high for satisfactory use (ry = .854, corrected for 
attenuation, rır .980). When the Sargent Jump is added to Mc- 
Cloy's Classification Index?? the correlations with batteries of 
track and field events become rather significant, ranging between 
.72 and .95 for various groups of boys from the fifth grade through 
college. McCloy *® feels that the Sargent Jump, when combined 
with an appropriate formula containing the factors of age, height 
and weight, predicts the power type of athletic ability but does not 
measure all the elements contained in motor educability or agility. 
He further indicates that it is probably the best single measure we 
have at the present time of predicting explosive energy. The formula 
recommended for use is 25 (Sargent Jump in inches) — (Classifica- 
tion index). 

. Adams?? studied a group of junior high school girls to determine 
the best combination of age, height, weight and Sargent Jump to 
predict athletic ability. She found that the factors of age, height 
and weight are of little practical value in classifying this group for 
athletic competition in a battery of track and field events but that 
the Sargent Jump will be found extremely valuable. The correlation 
between the track and field battery and the Sargent Jump runs 
slightly above .6 and when age is added R = .625. 

Van Dalen? studied a group of high school boys fifteen to seven- 
teen years old. He concluded that the Sargent Jump is a valuable 
test to predict ability to develop power provided it is practiced and 
correctly administered. 

Henry?! concludes that both fatigue and practice affect the 
reliability of the Sargent test, and states that reliability ranges 
Írom r equals .77 for one unpracticed trial to r equals .97 for the 
average of ten trials. He further reports that there seems to be 
confirmation that validity coefficients range from .48 to .59 as a 


m C. H., Tests and Measurements in Health and Physical Education, 
pp. 58-65. 


48McCloy, C. H., op. cit. 


49Adams, Eleonore G., “The Study of Age, Height, Weight, and Power as 
Classification Factors for Junior High School Girls,” Res. Quart. Am. Phys. 
Educ. Assoc., Vol. V, No. 2 (May, 1934), 95-100. 

59 Van Dalen, Deobold, “New Studies in the Sargent Jump," Res, Quart. Ат. 
Assoc. for Health, Phys. Educ., and Rec., Vol. XI, No. 2 (May, 1940), 112-115. 

51Henry, Franklin, “The Practice and Fatigue Effects in the Sargent Test," Res. 


Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 13, No. 1 (March, 
1942), 16-29. 


—— р а ч mc т, 


— o9 9 i 


Strength and Power 141 


measure of general athletic ability and from .53 to .75 as an index 
of athletic power. 

The MacCurdy Physical Capacity Теѕё. 5° In this test physical 
capacity is taken to mean that type of performance and achievement 
to be found in vigorous team games and sports. An index, known as 
the Physical Capacity Index, has been established, and this appears 
to be "an excellent measure of the muscular potentialities required 
for physical achievement," 53 but does not measure specific game 
skills. The underlying principle in this Physical Capacdy Index is а 
mechanical one, Power = Force X Velocity, in which the force is 
measured in pounds by dynamometers and includes the strengths 
of legs, back, hands, and arms, while the velocity is considered as the 
individual’s ability to overcome the force of gravity and is measured 
in inches by a vertical jump. The Physical Capacity Index will then 
be the total force multiplied by the vertical jump divided by 100. 

Strengths of back and legs are measured as in the Rogers’ Strength 
Test but arm-pulling force and arm-pushing force are measured, not 
by pull-ups and push-ups, but by a dynamometer arranged ona gym- 
nasium horse so that the individual can pull with his chest against 
the end of the horse or his back against it. While the same muscle 
groups are used as in pull-ups (chins) and dips, strengths are here re- 
corded in pounds and are therefore comparable to the strengths 
recorded for back, legs and grip. 

Criteria have been set up to establish validity using the biserial 
method of correlation. The coefficient of correlation for a group of 
boys from fifteen and one-half to twenty years of age between 
athletic achievement and physical capacity is very high (ro, = .95). 
The reliability of the Physical Capacity Index is also high (гү = .95). 
Growth curves for boys from ten to twenty years of age in Muscular 
Velocity, Muscular Power, Athletic Performance and Muscular 
Force are shown. These curves indicate that "the neuromuscular 
functions reach maturity at approximately the eighteenth year. 
In other words maximum athletic skill may not be expected before 
the boy reaches the age of eighteen years." 54 

It is assumed in this very excellent study that the Phystcal 
Capacity Index can be applied to all ages from the Junior High 
A Test for Measuring the Physical Capacity of 


New York, The author, 1933. 
, p. 40. 


5? MacCurdy, Howard Leigh, 

Secondary School Boys. Yonkers, 
53 MacCurdy, Howard Leigh, op. ctt. 
ваар. 41. 


142 The Status of Measurement in Physical Education 


School through college because of the fact that the data are compar- 
able to the data used by Rogers*? in establishing his strength index. 
Such an assumption is hardly justified in view of the fact that the 
relationships of age, height and weight to performance at the college 
age level are not of the same variety as at lower age levels.56 Since 
MacCurdy points out that neuromuscular functions reach maturi ty 
at approximately the eighteenth year, it would seem only logical to 
conclude that the relationships just mentioned may differ widely 


after eighteen years from those which exist before eighteen years. 


Selected References 


Bovarp, JOHN F. and Cozens, FREDERICK W.: The “Teap-Meter,’ An Investiga- 
tion into the Possibilities of the Sargent Test as a Measure of General Athletic 
Ability. Eugene, University of Oregon Press, 1928. Pp. 28. 

The Sargent Jump Test is not related as closely to general athletic ability 
with college men as with younger age groups, but when strictly heterogeneous 


groups are chosen the relationship may increase to a point of considerable 
usefulness. 


CARPENTER, AILEEN: “An Analysis of the Relationship of the Factors of Velocity, 
Strength, and Dead Weight to Athletic Performance.” Res. Quart. Am. Assoc. 
Jor Health, Phys. Educ., and Rec., Vol. XII, No. 1 (March, 1941), 34-39. 

Reports on an important theoretical study on the interrelationships of 
strength, velocity, and dead weight as they apply to athletic performance. 


CLARKE, Н. Harrison: “Objective Strength Tests of Affected Muscle Groups 
Involved in Orthopedic Disabilities,” Res. Quarl. Am. Assoc. Sor Health, Phys. 
Educ., and Rec., Vol. 19, No. 2 (May, 1948), 118-147. 

Describes available objective instruments and apparatus which can be used 


in orthopedic strength testing, various test items, and the results of research 
completed in this area, 


Cureton, THOMAS K., Jr. 
Physical Fitness," Su, 
Rec., Vol. 12, No. 2 ( 

Gives an interpret: 
testing and fi 


and Larson, LEONARD: “Strength as an Approach to 
ppl. to Res. Quart. Am. Assoc. for Health, Phys. Educ., and 
May, 1941), 391-406. 

ation of strength as an aspect to fitness, 


, a review of strength 
tness programs, a classificat 


ion of strength programs, a review 


55Rogers, Frederick Rand, Physical Capacity Tests in the Administration of Physi- 
cal Education. New York, Teachers Colle 


e, Columbia Universi ibu- 
tions to Education No. 173, 1925. s Pham Oty Cann 
56Cozens, Frederick W., “A Study of Stature in Rel 
ance," Res. Quart. dm. Phys. Educ. Assoc., Vol. I, No. 1 (March, 1930), 38-45. 
Also a great deal of unpublished material from which were derived indices for 
classifying boys for purposes of competition. Sec Frederick W. Cozens, Martin 
H. Trieb and N. P. Neilson, Physical Education Achievement Scales for Boys in 
Secondary Schools, p. 11. New York, A. S. Barnes and Company, 1936. 
Also: Cozens, Frederick W., “Strength Tests as Measures of General Athletic 
Ability in College Men," Res. Quart. Am. Assoc. Jor Health, Phys. Educ., and 
Rec., Vol. XI, No. 1 (March, 1940), 45-52. 


ation to Physical Perform- 


Strength and Power 143 


of the relationship of strength tests to athletic performance, and a discussion of 
the limitations of strength tests as an indicator of health and fitness. 

Hinton, EvELYN A. and Ranick, LAWRENCE: “The Correlation of Rogers’ Test 
of Physical Capacity and the Cubberley and Cozens Measurement of Achieve- 
ment in Basketball," Res. Quart. dm. Assoc. for Health, Phys. Educ., and Rec., 
Vol. XI, No. 3 (October, 1940), 58-65. 

A typical research study showing interrelationships between standardized 
tests. Reports a .809 coefficient of correlation between Rogers’ Strength Index 
and a basketball achievement score computed from the scales of Cozens, et al. 
with college women as subjects. A .550 correlation was found between the 


basketball test and arm strength alone. 

DeWirr, R. T.: “А Comparative Study of Three Types of Chinning Tests," 
Res. Quart. Am. Assoc. for ‘Health, Phys. Educ., and Rec., Vol. 15, No. 3 (October, 
1944), 240-248. р 

Shows the necessity of specifying procedure in chinning tests since on the 
average one can perform approximately two more chins when the palms-in 
grip is used than when the palms-out grip is used. 

Daniets, LUCILLE, WILLIAMS, MARIAN and WonTHINGHAM, CATHERINE: Muscle 
Testing, pp. 9-15. Philadelphia, W. B. Saunders Company, 1947. 

Reviews and critically analyzes proposed manual orthopedic strength tests. 
This source and the preceding one are of particular interest to the student of 


corrective physical education. 


Merneny, ELEANOR: “The Present Status of Strength Testing for Children of 


Elementary School and Preschool Age,” Res. Quart. Ат. Assoc. Jor Health, 
Phys, Educ, and Rec., Vol. 12, No. 1 (March, 1941), 115-150. 

Presents a thorough review of the use of strength tests for elementary school 
children, indicates findings on relationships between strength and other factors, 
makes recommendations on use of strength tgsting with this group, and includes 
an excellent bibliography. . 

McCtoy, CuanLEs H.: “Tests of Strength as Measurements of Physical Status," 
Chapter VII, Appraising Physical Status, pp. 60-69, Iowa City, University of 
Iowa Studies, Vol. XV, No. 2 (June, 1958). 1 4 

Reviews the problem of the relation of strength to physical status, presenting 
arguments in favor of strength as a measure of general fitness for working 


and living. 


CHAPTER VIII 


The Measurement of General Qualities: 
Motor Ability, Capacity, and 
Educability; Neuromuscular Control 


Tests of Motor Ability, Capacity and Educability 


Motor educability refers to "the ease with which an individual 
learns new skills” 1; motor capacity, to “fone’s innate potentialities 
. . . the limit to which the individual may be developed”; and 
molor ability, to the level to which one has developed his innate 
capacity to learn motor skills. General athletic ability is closely 
related to general motor ability, and refers to one's acquired level of 
learning in skills common to all athletic performance. 

Advances in the measurement of mental abilities, and the positive 
results in the development of mental intelligence tests quickened the 
interest of physical educators to explore the realm of motor intelli- 
gence. This section describes some of the outstanding efforts in 


measures. The difficulties of 
innate from acquired skill, plu 


general motor ability, or that acquired skill in the abilities common. 


to all motor performance, have given rise to the development of 
several test batteries entitled general motor or athletic ability. 


1McCloy, C. H., “A Program of Tests and Measurements for the Public Schools," 
Jr. of Health and Phys. Educ., Vol. VI, No. 8 (October, 1935), 19+. 
2McCloy, C. H., “The Measurement of. General Motor Capacity and General 


Motor Ability," Suppl. to Res. Quart. Am. Phys. Educ. Assoc., Vol. V No. 1 
(March, 1954), 46-61. 


144 


Motor Ability, Capacity and Educability l ‚145 


These serve useful purposes for diagnosis and prescription of activity 
to meet individual needs. 

Brace's Scale of Motor Ability Tests.? By means of scientific 
procedures Brace, who has done pioneer work in this area, de- 
veloped a scale of motor ability tests for use in measuring native 
motor ability (motor capacity) ages eight to eighteen. The scale 
consists of twenty events, two batteries of ten events each, in the 
nature of stunts which are easy of administration and simple to 
score. (All scores are recorded on the success or failure basis.) 
1. Walking in a straight line. 

2. Stand jump into the air and clap bot 
land with feet apart any distance. 


. isi i flat on the floor. , 
. ا‎ recover with arms folded behind the 
back. 

Floor push up three ti 
Floor spring with han 
sideward. , , 
. Full turn left in the air and land without losing balance. 


7 
8. As in 2 except feet clapped together twice. | 
9. Bending and touching left knee to floor while grasping left foot 
behind right knee. 
10. Jumping through loop fo 
11, ae air and slap both heels with hands behind back. 
12. High kick so that toes come at least level with shoulders. 
13. Forward bend with both hands and head touching floor and 
right leg extended backward. 
14. Full bend knees with arms between knees and around ankles, 
holding for five seconds. 
15. As in 7 turning right. 
16. Jumping to feet from i ie 
17. Crossed legs and arms. Sit 21 сүт with A. 
i iti or te : 
= pude Hed E p right foot against left knee. 


19. F d for fi e seconds. By 
20. xdi dip i foot extended forward and recover position. 


h feet together once, and 


оз 


mes in succession. 
ds clasping toes to feet apart and arms 


Dior 


rmed by grasping one toe with oppo- 


kneeling position. 


3 e 3 K., Measuring Motor Ability, p. 105, 1927, by 
Gare ee 2 Broce, Denier. Complete details for administering the 
. S. Barnes ап any, 
test are given in this book. 
“Brace, David K., ор. cit, P- 99. 


146 -The Status of Measurement in Physical Education 


The following claims are made with reference to the application 
of the scale of motor ability tests:* 


1. Motor ability scale scores may be used as the basis for deter- 
mining an accomplishment quotient for the activities of physical 
education. 

2. The scale of tests should form a valuable element in a scheme 
of classification of pupils for class work in physical education. This 
should be especially valuable in programs which favor the individual 
development of pupils. 

3. Diagnosis of special performance disabilities may be assisted 
by a study of individual reactions to these tests. 

4. Experimental studies in physical education which require the: 
equating of groups of pupils have been greatly handicapped by the 
lack of standardized tests of general motor ability. 

5. Finally, the scale of tests should stimulate other scientific ef- 
forts in the field of tests and measurements in physical education. 
Leads thus far established should assist other research on the nature 
of motor ability. 


Brace made an exceedingly valuable contribution to the test and 
measurement program. His procedures are scientific and statis- 
tically sound and the test has had wide use among those scientifically 
inclined. 

Vickers, et al.5, describe a modification of the Brace Test which 
combines items of the two Batteries for use with children from five 
to nine years of age. 

In a recent study Brace? concluded that his test "does not 
measure motor learning to an extent that would justify the test 
being classified as a test of motor educability", and that the “ test is 
slightly superior to the Iowa revision of the Brace Test? as a measure 
of motor learning." Gire and Espenschade? support his findings. 
Anderson and McCloy ? hold that while the Brace Test is a valuable 


5Vickers, Vernette S., Poyntz, Lillian and Baum, Mabel Pottinger, “The Brace 
Scale Used with Young Children,” Res. Quart. dm. Assoc. for Health, Phys. 
Educ., and Rec., Vol. 15, No. 3 (October, 1942), 299-308. 

9 Brace, David K., "Studies in Motor Learning of Gross Bodily Motor Skills,” 
Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 17, No. 4 (Decem- 
ber, 1946), 242-253. 

"See p. 147. 

8Gire, Eugenia and Espenschade, Anna, “The Relationship between Measures of 
Motor Educability and the Learning of Specific Motor Skills,” Res. Qart. Am. 
Assoc. for Health, Phys. Educ., and Rec., Vol. 13, No. 1 (March, 1942), 43-56. 

9 Anderson, Theresa and McCloy, C. H., “The Measurement of Sports Ability 
in High School Girls,” Res. Quart. dm. Assoc. for Health, Phys. Educ., and Rec., 
Vol. 18, No. 1 (March, 1947), 2-11. 


Motor Ability, Capacity and Educability 147 


test it “seems to be more of a test of general motor ability than a 
specific test of motor educability.” 

Espenschade!° reports that improvement in the stunt type motor 
ability tests can be brought about by participation in activities 
which develop strength, flexibility, coordination and control, but 
practice in the test items themselves in addition to the other activ- 
ities does not influence improvement. 

The Iowa Revision of the Brace Scale of Motor Ability 
Tests.!! Despite “а relatively high correlation with every type of 
motor ability test with which it is used,” 12 McCloy feels that the 
Brace Scale of Motor Ability Tests seems to lack “something quite 
essential" as a measure of general motor capacity. In the revision 
McCloy retained ten of the items in the original battery, added new 
material and changed the administration and scoring. “This re- 
vision has resulted in approximately doubling the validity of the 
test." It is now used as one of the elements in measuring general 
motor capacity. 

Sectioning Students into Homogeneous Teaching Units. 
Johnson, 13 using 1500 pupils of both sexes ranging in age from eleven 
to thirty-eight, has set up а battery of tests (or exercises) which 
attempt to measure "native neuromuscular skill capacity." АП 
exercises which involve pronounced elements of "strength, speed, 
endurance, fear, familiarity, strangeness OF practice" were elim- 
inated. The exercises are performed on mats and involve locomotion 
for a distance of 15 feet but no special apparatus, except a piece of 
canvas, placed over two standard 5 X 10 gymnasium mats, on 
Which is painted a pattern or design. The pattern consists of 18-inch 
Squares on the outside of the 414 X 15 foot area, alternately painted 
black starting with the second square.. The middle lane between the 


two outside lanes is not marked off in squares but contains targets 


“Practice Effects of the Stunt Type Test," Res. Quart. Am. 
duc., and Rec., Vol. 16, No. 1 (March, 1945), 57-41. 
(Iowa Revision of the Brace Scale of 
nd published by the author, second 
C. H., Tests and Measurements in 


10Espenschade, Anna, 
i Assoc. for Health, Phys. Ec ec. 
! McCloy, C. H., Test of Motor Educability 
Motor Ability Tests). Mimeographed a 
pension April, 1955. See also: M 
сайл Physii "ducalion, PP- . 
win end Hiro estet of General Motor Capacity and General 


Motor Ability," Suppl. to Res. Quart. Ат. Phys. Educ. Assoc., Vol. V, No. 1 
„ (March, 1954), 52. 

Johnson, Granville B., “Physical 
geneous Units," Rer. Quart. Ат. 
1932), 128-156. 


Skill Tests for Sectioning Classes into Homo- 
Phys. Educ. Assoc., Vol. IIL No. 1 (March, 


148 The Status of Measurement in Physical Education 


3 X 12 inches in the center of alternate imaginary squares starting 
with the first. 

Ten exercises are demonstrated by the instructor, one at a time, 
and pupils perform each exercise before the next is introduced. The 
exercises consist of straddle jump, stagger skip, stagger jump, for- 
ward skip holding the opposite foot from behind, front roll, jumping 
half-turns, right or left, back roll, jumping half-turns, right and left 
alternately, front and back roll combination, and jumping full turns. 

A score of 10 is given for a perfect execution of each exercise and 
points are deducted for performances in which the pupil over-steps 
or misses squares, fails to land on both feet at the same time, fails to 
maintain rhythm, etc. The individual's final score is on the basis of 
a maximum of 100 points. Pupils are grouped into any number of 
sections according to their total score. 

A validity coefficient of .69 and a reliability coefficient of .97 are 
given but no mention is made of the criteria used in validation. | 
Despite the potential usefulness of a test of this type, it must be 
pointed out that a validity coefficient as low as (r = .69) still leaves 
the standard error of estimate so large that it is reduced less than ` 
one third from a pure guess. Gire and Espenschade!* found a 
reliability coefficient of but .61 for the test with high school girls. 

An interesting observation, made by the test maker, shows that 
‘a child of twelve years had no more difficulty in executing the 
exercises than did the college freshman." 15 Also it is pointed out 
that the relationship existing between intelligence scores and 
physical test scores is markedly different for the extreme age groups; 
for college students r — .49, while for junior high school pupils 
r = .15. This is difficult to explain except as a possible ‘‘manifes- 
tation of maturation." In a more recent study Johnson!? found no 
significant relationship between his test and general intelligence as 
measured by Thurstone’s Psychological Examination for College 
Freshmen. 

Metheny suggests that "combinations of Johnson's exercises oí 
5 (front roll), 7 (back roll), 8 Gumping half turns) and 10 (jumping 


14Gire, Eugenia and Espenschade, Anna, op. cil. 

15Johnson, Granville B., op. cil., p. 133. 

16Johnson, Granville B., “А Study of the Relationship that Exists Between 
Physical Skill as Measured, and the General Intelligence of College Students," 


Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 13, No. 1 (March 
1942), 57-59. 


Motor Ability, Capacity and Educability 149 


full turns) for boys, and of 5, 7, and 8 for girls, might well be used 
instead of the whole Johnson test, and that the results of these 
combinations would be as much value in the measurement of motor 
educability and for sectioning classes as would the original Johnson 
test, and would greatly reduce the amount of time required for the 
administration of the test."!? The combination of 5, 7, 8 and 10 
for boys produced a correlation coefficient of .977 with the total 
Johnson score, and 5, 7, and 8 for girls, .868. She also suggests a 
simplified mat which can be easily constructed for use with the 
modified battery. 

McCloy’s Test of General Motor Capacity. General .-motor 
capacity as used here refers to “inborn, hereditary potentialities for 
general motor performance. The word general in this definition is in 
contrast to specific abilities such as the ability to play basketball.” 18 
McCloy further goes on to explain that this test is not to be re- ` 
garded as a test of specific skills or abilities: 

Events which demand a high degree of developed special abilities 
will usually correlate less highly with the test than those events 
which do not require such abilities. This is particularly true of 
events which require a specialized development that might be 
called character qualities, such as physical courage, quick thinking, 
and aggressiveness, as found in football. Such traits have no re- 
lationship to this test. Asa result the general motor capacity score 
correlates highly with track and field, but only correlates about .7 
with football and basketball in high school groups. Therefore, this 
test should not be used to attempt to predict those things for which 
it is not designed. Its use should be restricted to the prediction of 
motor capacity, remembering that it will predict potential levels to 
which the individual may attain more accurately than it will test 
his present developed abilities. 1° 


!7 Metheny, Eleanor, “Studies of the Johnson Test as a Test of Motor Educa- 
bility,” Res. Quart. Am. Assoc. for Health, Phys. Educ., and Ree., Vol. IX, No. 4 
(December, 1938), 105-114. ч Р Я 

18McCloy, C. H., The Measurement of General Motor Capacity. Mimeographed 
and published by the author, State University of Iowa, 1953. Here may be 
found tables and standards for all age groups of both sexes. See also: McCloy, 
C. H., Tests and Measurements in Health and Physical Education, рр. 122-132, 
561—565. 

19 McCloy, C. H., "The Measurement of General Motor Capacity and General 
Motor Ability," Suppl. to Res. Quart. Ат. Phys. Educ. Assoc., Vol. V, No. 1 
(March, 1934), 57. See also: McCloy, С. H., Tests and Measurements in Health 


and Physical Education, pp- 122-152. 


150 The Status of Measurement in Physical Education 


Suitable criteria were developed for both boys and girls, and 
formulae, each containing four separate test elements, were derived 
for four age groups: (1) elementary school boys, (2) junior and 
senior high school boys, (3) elementary school girls, and (4) junior 
and senior high school girls. These formulae are to be used for com- 
puting general motor capacity scores. 

Ehrlich,?° using the fencing lunge in measuring learning of a 
motor skill, found a multiple correlation coefficient of .674 between 
McCloy’s Motor Capacity Test and improvement in accuracy of 
total body movement, and that his test “is a satisfactory diagnostic 
instrument for evaluating potential learning when both accuracy 
and speed of muscular movements are involved in a motor skill.” 

McCloy’s Test of General Motor Ability. The term “ability” 
here refers to a developed capacity. The author feels that the test 
should avoid highly specialized skills'and mastery of specific achieve- 
ment. To develop a criterion a large number of separate items were 
combined into a total score. In analyzing the various items it was 
discovered “that there were two types of tests, excellence in which 
18 accompanied by excellence in almost all other abilities that could 
be expected to measure general motor ability. These tests were: 
(1) а combination of three or four track and field events scored in 
points, together with (2) a strength test. These two tests combined 
in a regression equation gave as high a prediction of general motor 
ability as was given by any other combination of events. Other 
items added to this battery gave no other significant additional 
predictive value."?! Tables for scoring the tests are provided by 
the author. 22 

General Motor Ability and Capacity Tests for the First 
Three Grades. Carpenter?? has developed formulae, patterned 
after the work of McCloy, for the prediction of general motor ability 
and general motor capacity of boys and girls in the lower elementary 
grades. A General Motor Ability score is computed from a battery 


20Ehrlich, Gerald, “The Relation Between the Learning of a Motor Skill and 
Measures of Strength, Ability, Educability, and Capacity,” Res. Quart. dm. 
Assoc. for Health, Phys. Educ., and Rec., Vol. 14, No. 1 (March, 1943), 46-59. 

21McCloy, C. H., op. cit., p. 57. 

22McCloy, C. H., The Measurement of General Motor Capacity. Mimeographed 
and published by the author, State University of Iowa, 1933. 

23Carpenter, Aileen, “The Measurement of General Motor Capacity and General 
Motor Ability in the First Three Grades," Res. Quart. Am. Assoc. Jor Health, 
Phys. Educ., and Rec., Vol. 15, No. 4 (December, 1942), 444—465. 


Motor Ability, Capacity and Educability 151 


consisting of weightings of the broad jump, shot-put and body 
weight. General Моіог Capacity is predicted from a formula com- 
posed of weightings of the Sargent Jump, Burpee test, Brace type 
stunt test and McCloy’s Classification Index. A General Motor 


Achievement Quotient, which indicates how good a pupil is compared 
eing, can be derived from these two 


other combinations of activities are 
s for giving the various tests, 


to how good he is capable of b 
scores. Similar formulae with 
presented by the author with direction: 
and scoring tables. 

Garfiel’s Motor Ability Test for College Women. After a 


careful study of the concept of motor ability, Garfiel?* came to the 


conclusion that the aspects of motor ability may be thought of under 


five heads: 


1. Speed of voluntary movement. 

2. Accuracy of voluntary movement. 

3. Control of involuntary movement or steadiness. 
4 

5 


. Strength. E 
. Motor adaptability, & e» capacity to solve the motor situations 


or to make a new coordinated movement accurately. 


takes into consideration 


It will be seen that this classification 
cle groups as well as the 


motor tests of both the finer or smaller mus 
big-muscle groups. Up to this time the finer muscle groups had not 


been considered in the measurement of motor ability as the term is 
commonly thought of in connection with physical education tests. 

The real purpose behind Garfiel's experiment was to set up a 
battery of tests measuring motor ability so that she might determine 
the relationship (if any) between motor ability and intelligence. 5 

The subjects used in the experiment were Barnard College women 
and the tests used were sorted out from а list of 16 purporting to 
measure motor ability. The criterion selected was the combined 
Judgment of eight competent judges on the degree of motor ability 
Dossessed by each of the subjects. This combined judgment was 
found to be a meaningful and reliable one, having a correlation with 
a second rating of .92. Because of the fact that the final battery ot 
tests had a correlation of -794 with the criterion, it was concluded 


24Garfel, E. 7 

ане, уп, ор. cit. s Pr 

25For ep rar aem of this subject, see Frederick W. Cozens, Status of 

the Problem of the Relation of Physical to Mental Ability, Am. Phys. Educ. 
Rev., XXXII (March, 1927), 147-155 


152 Status of Measurement in Physical Education 


that “the use of the term molor ability?® is justified and we may 
begin to investigate the relationship of this group of abilities to 
intelligence and to various special abilities for vocational guidance, 
and that we may begin at once to use the tests as a means of classifi- 
cation of students for work in physical education because these so 
nearly approximate in seventeen minutes (duration of test) the 
judgment of eight teachers and students after six months or more of 
acquaintance." The reliability of the scale is .77. 

The final battery of tests with the assigned weights for optimum 
rating is as follows: 


Test Assigned weight 
1; Running (100-yd. dash). ы. css 42 E errem ow even eec 100. 
2. Picking up paper (see description, p. 164). 5 
3. Strength of back—by dynamometer 4 
AMENCER ДҮ EE. in oe 30 
hy Glee inert бе E —2 
6. Tapping (see description, p. 161) 3 
7. Leg strength—by dynamometer, . . EL 
8. Hand strength—by dynamometer.......................... 8 


University of Oregon Motor Ability Test.?? In an endeavor 
to set up a battery of tests which can be used to classify freshman 
college women according to their motor ability, the investigators 
decided upon a procedure similar to that used by Cozens.?8 Judg- 
ments were secured from representative physical educators through- 
out the country on the fundamental bodily skills necessary for 
success in all branches of motor activity. The final classification 
resulted in eleven fundamental bodily skills and fourteen objective 
tests by which to measure them. The criteria used for validating 
the composite score of the fourteen tests as a measure of motor 
ability are: (1) analysis of motor ability into its component parts 
by experts, (2) wide sampling of ability in the fourteen tests, (5) the 
high average scores for athletes in comparison to the average group, 
(4) judgment ratings of university instructors in physical education, 
and (5) five-term grades in physical education. 
26The term “motor ability’ as used here refers to the 
accurate and quick movements, 
sport activities. 

?7 Alden, Florence D., Horton, O'Neal, Margaret, and Caldwell, Grace Marie, “A 
Motor Ability Test for University Women for the Classification of Entering 


Students into Homogeneous Groups,” Res. Quart. Ат. Phys. Educ. Assoc., 
Vol. III, No. 1 (March, 1932), 85-120. 


28Cozens, Frederick W., The Measurement of General Athletic Ability in College 
Меп. Eugene, University of Oregon Press, 1929. 


ability to make strong, 
the general ability shown in gymnasium and 


Motor Ability, Capacity and Educability 153 


Six tests were selected as possibilities for a preliminary or trial 
battery. These tests have reliability coefficients ranging between 
.63 and .87, correlations with the composite score ranging between 
.37 and .70, and correlations with a judgment rating of instructors 
between .30 and .76. On further experimental work some of the 
correlation coefficients between the separate tests and the composite 
score were raised materially (in one instance, from .37 to .64). 

The final battery selected consists of four tests: (1) 40-yard maze 
run, (2) ball change, (3) trunk bend, and (4) jump and reach. Just 
what multiple correlation coefficient this battery yields with the 
criterion is not given although the indication is that the battery 
gives the highest multiple correlation with the composite score as 
well as the judgment criterion. In one case of a judgment rating 
with the composite score, a correlation as high as .60 is listed. The 
raw scores in this battery of tests are changed to percentile ranks 
by means of a table of equivalents, a portion of which is shown in the 
reference. This permits comparisons between individuals for the 
entire battery. Test directions are also appended. 

In general it should be pointed out that the reliability coefficient 
of the ball change (ту = -41 — -54) makes it almost useless for in- 
clusion in a battery designed for diagnostic purposes. It should be 
noted also that such a reliability coefficient will materially reduce 
the relationship of the short battery with the criterion. It would 
seem reasonable to believe that higher reliability coefficients should 
be secured throughout, but even so the battery reliability coefficient 
may run as high as .85, by assuming fairly low intercorrelations 
between the tests. 

The Minnesota Motor Ability Tests for College Women. ?? 
In this study much the same procedure was followed with women 
that Cozens?? followed with men. The quality known as motor 
ability was analyzed and a number of tests formulated under each 
of the seven general elements. It was shown that the composite 
score was very acceptable аз а criterion and that a number of short 
batteries of tests could be devised from a total of seventeen valid 
and reliable tests. Each battery was weighted to produce the highest 


2 3 CE t 

29Graybeal, Elizabeth, Minnesota Motor Ability Tests for College Women. Part 
of dn study, Measurement of Outcomes of Physical Education. Minneapolis: 
University of Minnesota Press, 1957. mu 

20Cozene Broderick Wy The Measurement of General Athletic Ability in College 
Men. Eugene, University of Oregon Press, 1929. 


ж 


154 The Status of Measurement in Physical Education 


possible correlation with the criterion. The final, weighted battery 
was made up as follows: 


Weight 
Medicine ball throw for діѕѓапсе.......................... Е 
"Ball Gáte]Ibos o жыша» vas sin aren ein өнөк бизә» 

Standing broad јштр...................... 

Forward rolls for time—16-foot mat Hn 
Баа EAU CL ES жый кезй» Siete sos мин келеди Ri и сен, 

Too —.918 = 38.78 

гп (battery) 7.956 Standard error of estimate= 16.6 


The Humiston Motor Ability Test for College Women.?! 
In this study a battery of tests involving seven items was developed 
for measuring the present status of motor ability in college women. 
It is recommended for use as a classifying device for purposes of 
instruction and for fairness in competition in intramural games, for 
indicating the progress of students in class activities, as a criterion 
for validating special skill tests and finally as a part of an examina- 
tion for selecting major students in physical education. 

Following a detailed analysis of expert opinion on the funda- 
mentals embraced in motor ability, fifteen items were selected to 
represent the composite score of each individual. By recognized 
statistical techniques seven items were finally selected as best 
measuring the ability represented by the composite score. These 
seven items are administered as a single consecutive unit and this 
unit has a reasonable degree of relationship with the composite 
score (r — .81). When administered as separate items the battery of 
tests has a much higher relationship with the composite score 
(r = .92). 

In administering the single-unit test, the subject must: 


1. Follow the pattern of a maze run; 

2. “Lie down and roll over on a mat and rise to feet;" 

5. Climb over a box after a short run; 

4. “Коп and turn in a circle and continue on between barriers; 

5. Climb and descend a perpendicular ladder; 

6. Take a basketball from an assistant, run to a rope stretched 
between standards, toss the ball over the rope and catch it; 

7. Finally complete a short straightaway run. 


31Humiston, Dorothy, A Measurement of Motor Ability in Women. Unpublished 
doctor's dissertation, New York University, 1936. See also Res. Quart dm. Phys. 
Educ. Assoc., Vol. VIII, No. 2 (May, 1937), 181-185. 


Motor Ability, Capacity and Educability 155 


The elapsed time in seconds and tenths from start to finish 
constitutes the student’s score. 
The validity of this test rests upon the following findings: 


l. The degree of relationship between the single unit and the 
composite score, and the fact that the composite score represents the 
judgment of experts as to what fundamentals are embraced in 
motor ability. 

2. The highly reliable difference in scores between athletes and 
nonathletes. 

8. Sport teams grouped according fo scores into three levels of 
ability show the same relative ability in winning games. 

4. Advanced major students are shown to be distinctly superior 
as a group to freshmen. 

The relationship between teacher judgments and ability does not 
appear to add much to validation findings. 


Scott's General Motor Ability Test for Girls and Women. ?? 
After extensive experimentation Scott recommends a minimum 
motor ability test battery consisting of an obstacle race, basketball 
throw for distance, and standing broad jump. Suggested for addi- 
tion or substitution in the battery are four-second dash, and fifteen- 
second wall pass. A complete description of the test, T-scales for 
high school girls, college women, and physical education major 
listed. Also included are suggestive 


women are given in the source 
e three item test the equation 1s: 


regression equations. For th 
2 (basketball throw) +1.4 (broad jump) —1 (obstacle race) 


For the four item test the equation is: 
.7 (basketball throw) +2 (dash) +1 (passes) +.5 (broad jump) 


Multiplication tables for use in computing the equations are given. 


The test is useful in classifying students for sections, squads or 
teams. It offers а good estimation of expected ability for those 


3?Scott, M. Gladys and French, Esther, Better Teaching Through Testing, PP- 
136-153, New York, A. S. Barnes and Company, 1944. м 
Scott, M. Gladys, “The Assessment of Motor Ability of College Women, Res. 
Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. X, No. 5 (October, 
1939), 63-83. ` 
Scott, M. Gladys, “Motor Ability Tests foi 
Assoc. for Health, Phys. Educ, and Rec, 
402-405, 


r College Women,” Res. Quart. Am. 
Vol. 14, No. 4 (December, 1943) 


156 The Status of Measurement in Physical Education 


beginning a new activity. The battery meets the criteria of 
administrative ease, and economy of time. 

Motor Ability Test for High School Girls.?? The Newton 
Motor Ability Test was developed by Wellesley College graduate 
students in cooperation with staff members of the Newton, Massa- 
chusetts High School Physical Education Department for Girls. 
The four part battery developed consists of: standing broad jump 
measured to the nearest inch; speed over hurdle course timed to the 
nearest fifth second; scramble test measured to the nearest fifth 
second which involves three and a half round trips from back lying 
rest position to bell device ten feet away; and velocity throw meas- 
ured to nearest pound with use of apparatus connected to a 50-pound 
dynamometer. The battery was correlated against a subjective 
criterion based on teacher judgments; a sports skill criterion based 
on achievement in six basic sport skills; and an objective criterion 
based on scores on eighteen tests of basic elements of motor ability. 
A three item battery consisting of broad jump, hurdles and scramble 
were finally selected for use. This battery correlates .754 with the 
subjective criterion, and .908 with the objective criterion. Achieve- 
ment scales for the three item battery and for each item within the 
battery have been worked out according to the method proposed 
by Cozens et al. 3* The test was found useful at Newton High School 
for aiding the teacher to locate markedly superior and markedly 
inferior students early in the semester. 

A Measure of General Athletic Ability for the College 
Мар.35 The various tests of physical efficiency or general athletic 
ability which were brought forward up to the time Cozens developed 
his test of General Athletic Ability for College Men were more or less 
empirically devised. What phases of all-around physical ability 
they measure is not known because of the fact that there were no 
criteria for evaluating such batteries of tests. 

The battery of tests originally worked out?9 was revised? for 


33 Powell, Elizabeth and Howe, Eugene C., “Motor Ability Tests for High School 
Girls," Res. Quart. Am. Assoc. for Health; Phys. Educ., and Rec., Vol. X, No. 4 
(December, 1939), 81-88. 

34See p. 106. 

35Cozens, Frederick W., The Measurement of General Athletic Ability in College 
Men. Eugene, University of Oregon Press, 1929. 

36Cozens, Frederick W., The Measurement of General Athletic Ability in College 
Men, p. 177. Eugene, University of Oregon Press, Physical Education Series, 
Vol. I, No. 3, April, 1929. 

37Cozens, Frederick W., Achievement Scales in Physical Education Activities for 
College Men, p. 114. Philadelphia, Lea and Febiger, 1956. 


І 


Motor Ability, Capacity and Educability 157 


administrative reasons and now consists of the following events: 


Weight or 

: mulliplier 
Dips sacar sia hei XERE зз ERAS VEI dis Tv eoe niea 0.8 
Baseball throw for distance. «s ches 
Football punt for distance. . 1.0 
Standing broad jump.... 0.9 
Bar snaps. A ез жь» 0.5 
Dodging........ 1.0 
1.5 


"These events when used in a battery measure the ability indicated, 
correlating exceedingly high with the criterion, and are reliable and 
easily administered. Raw scores are transposed into scale scores 
(sigma index scores) and these are multiplied by weights to obtain 
the relative value which each test contributes to the general quality. 
The statistical measures necessary for the use of this battery of 
tests are shown in the accompanying table. 


ESSENTIAL STATISTICAL DATA FOR THE Use or Cozens’ 
Test or GENERAL ATHLETIC ABILITY 


Correlation of battery with vriterion. «s seeceere mre ee eset tenerte гос= .967 
Battery reliability соеЙсіепё..... 0 1 0 100707 ги=.968 
Mean score =350 
с (criterion) =65.7 
т (weighted battery) = 80.9 
Standard error of estimate (true criterion ability)...............« с 0,1—14.8 

с 0,1—14.5 


Standard error of measurement.. s. s.s.s eet het 


The battery of tests serves as an index for classifying students 
physically as superior, above average, average, below average and 
inferior. This classification is very helpful to the instructor in 
grouping his students for teaching purposes. For instruction and 
practice in fundamental skills he may have all the weak students 
in one group, the average in another, etc. When it comes to com- 
petition within the class period he has a means of dividing the group 
So that competitive teams may be equally matched on the basis of 
all-around ability. 

A most КЕЛДЕН use for the battery of tests lies in its diagnostic 
Power. А review of the scale scores made in each test immediately 


7 


158 The Status of Measurement in Physical Education 


gives the examiner a mental view of the physical weaknesses of the 
individual so that courses which will assist him in developing his 
physical power may be prescribed. 

Motor Ability Tests for College Men. To classify students 
homogeneously on the basis of elements underlying motor per- 
formance Larson? proposes two motor ability tests, one for use 
indoors, and one for use outdoors. The indoor battery combines 
dodge run, bar snap, chinning, dipping and vertical jump, and 
correlates .9687 with a criterion measure of twenty-five motor 
ability items. The outdoor battery combines baseball throw for 
distance, chinning, bar snap, and vertical jump and correlates .9804 
with the criterion. The reliability coefficients for both batteries ex- 
ceed .86. Raw scores must be changed into weighted standard scores 
before summing. Scoring and classification tables for college men 
are given. These tests are easily administered and quickly scored. 

The Predictive Value of Selected Motor Ability Tests. In 
an investigation of the possibilities of predicting sports ability, a 
number of standardized motor ability tests were compared with 
teacher ratings of sports ability for a sample of high school girls. 
Some of the findings reported include:?? 

The General Motor Capacity Score (McCloy's) is the variable 
which has the highest correlation (.812) with the sports ability 
rating. . . . The Sargent Jump has the second highest correlation 
(.730) with the ratings. It should be re-emphasized that this test 
should be given only after adequate practice has been given for the 
individuals to learn the form. . . . The Brace Test had the next 
highest correlation (.706) with the ratings, and as a single test, 
namely, even when not used in combination with other tests is a 
valuable test item. It seems to be more of a general test of motor 
ability than a specific test of motor educability. . . . The Hill 
TTest*9?* had a correlation of .690 with the ratings, the Iowa Brace, 
t682, and the Johnson test .678. . . . The ability to change direc- 
Jon, particularly as measured by the Cozens’ dodging run con- 
tributes a good deal and this contribution is in addition to the power 
expended (rors = .458). . . . The ability to make quick and 
38Larson, Leonard A., “A Factor Analysis of Motor Ability Variables and Tests 

for College Men," Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., 

Vol. 12, No. 3 (October, 1941), 499-517. 
39Anderson, Theresa and McCloy, C. H., “The Measurement of Sports Ability in 

High School Girls," Res. Quart. dm. Assoc. for Health, Phys. Educ., and Rec., 

Vol. 18, No. 1 (March, 1947), 2-11. 
40Hill, Kenneth, Zhe Formulation of Tests of Motor Educability for Junior High 


School Boys. Unpublished Master’s Thesis, State University of Iowa, 1935- 
*Other tests referred to are described elsewhere in this chapter. 


McCloy, C. H., “Blocks Test of Multiple Response,’ 


Motor Ability, Capacity and Educability 159 


adaptive motor responses also seems to be an important part of the 
abilities making up this general sports ability. In the present study 
this ability is measured by the Blocks Test. *! 


Neuromuscular Control Tests 


From a system of scoring athletic events which are supposed 
to contain the elemental qualities of motor ability, we come to 
those endeavors to evaluate the qualities directly. We find the 
psychologist as much interested as the physical educator. The ever 
tantalizing question has been: is there any correlation between 
mental and motor ability? 

Dozens of tests of this nature have been devised. Beginning 
with the extensive use of the dynamometer in the early eighties, 
psychologists set to work to discover a relationship between certain 
forms of motor control and mental capacity. The aim of these tests 
was to discover either (1) the extent to which an individual possessed 
nervous energy or (2) the ability of an individual to use this nervous 
energy in coordinating his mechanism in a particular way. These 
may be classed therefore under two main heads: capacity and 


endurance; and coordination. 


Tests оў Capacity and Endurance 


l. The ergographic experiments (see page 126). 

2. The dynamometer tests for endurance?? (see page 127). 

8. The U-tube manometer or" 40 mm. mercury test,” which Flack 
and Burton investigated in 1922. This test was used by the Royal 
Air Force as one of the tests of physical efficiency and is popularly 
known as the “endurance” or “fatigue” test. The nose is clipped, 
the lungs are emptied, the individual inhales fully, blows mercury 
in a U.shaped tube to 40 mm. and maintains it there as long as 
Possible without breathing; the cheeks and lips are supported by 
опе hand in such a manner that they take no part in the process. 
From the other wrist the rate of pulse is recorded during periods of 
five seconds. The length of time held and the type of pulse observed 
vary according to the fitness of the subject for flying duties. The 
average time for flyers holding mercury at 40 mm. is fifty-two 
Seconds. 

” Psychometrika, Vol. 7, 


No. 3 (Se 2), 165-169. 

J ptember, 1942), 165 ^ 

“Whipple, G. M., Manual of Mentala 
York, 1910, 


nd Physical Tests. Baltimore, Warwick and 


_ Жи 


160 The Status of Measurement in Physical Education 


4. The“ stairs test’ as reported by Collins and Howe. *3 

In this experiment the subject runs up and down a flight ot 
twelve steps, one step at a time, touching a key at the top of the 
flight and, in order to equalize the ground covered in each round trip, 
touching a wall opposite to and a step from the bottom of the flight. 
Each blow on the key is recorded by a signal magnet on a rapidly 
moving long-paper kymograph, on which the seconds are being 
recorded by another signal magnet. The time of each round trip on 
the stairs can be read to 0.01 of a second. The subject is instructed 
not to spare herself—to run courageously at her top speed till five 
round trips have been covered. When the capacity of the smoked 
drum is reached, the latter is shellacked and the time for the five 
round trips plotted. The object is to estimate from the shape of the 
curve, the subject’s endurance, the algebraic sum of her cardiac 
efficiency, her neuromuscular skill and economy, and her courage. 


5. The general motor efficiency test as reported by Bilhuber.** 
This test is designed to serve as an all-around test because it com- 
bines the elements of running, jumping, climbing and throwing. The 
procedure in giving this test is described as follows: 


(a) Crouch start at starting line. Fingers and toes behind line. 

(6) Run 20 feet to a pair of jump standards across which a line is 

stretched 2 feet high. Clear this height. 

(c) Run 20 feet further to a gymnasium mat. Somersault on it 

or get both shoulders flat on mat. 

(d) Crawl, slide or dive under a table 15 feet from the mat. Table 

is 2 feet, 2 inches high. 

(e) Run 12 feet to a climbing rope. Climb to touch ribbon tied 
on the rope 9 feet above the end of the rope. 

(f) Run 12 feet further and pick up a volley ball. 

Note: At this point the course changes to run at right angles to 

the course just completed. 

(g) Run 22 feet with the ball and toss it over a cord stretched 
between a second pair of jump standards 8 feet high. Catch 
the ball on the other side of the cord. Run 12 feet with it and 
place it on the floor on a mark. Here again the course changes 
to run parallel with the first portion. 

- (h) Run 17 feet to the boom. Walk from end to end on the bal- 
ance beam (lower boom) without using hands to aid in 
balancing. à 

(2) Run 10 feet further to three Indian clubs which must be 
transferred from one circle to another, 4 feet away in the same 


<> 


43Collins, Vivian D. and Howe, Eugene C., “A Preliminary Selection of Tests of 
Fitness,” Am. Phys. Educ. Rev., XXIX (December, 1924), 567. 

44Bilhuber, Gertrude, “Functional Periodicity and Motor Ability in Sports,” 4m. 
Phys. Educ. Rev., XXXII (January, 1927), 25. 


Motor Ability, Capacity and Educability 161 


line. (That is, across the direction of the course.) Transfer 


one by one and place upright. 
(/) Sprint for finish line 40 feet away. 


Time is taken with a stop watch from beginning to finish. Knocking 
the clubs down or missing the ball, etc., necessitates a new trial. 


Tests of Coordination 


A. Finer Muscle Adjustments. 


l. Tapping. An index of voluntary motor ability.45 The 
number of times an individual is able to tap with an iron stylus on a 
metal plate in a given length of time is recorded electrically. 

2. ' Three hole test”?#® The number of times an individual is 
able to tap in three holes arranged in a triangular fashion. 

3. Tracing. Accuracy, precision or steadiness of voluntary 
movement (Whipple). The individual attempts to follow a line 
with unsupported arm and hand. 

4. Steadiness of involuntary motor control. 


(a) Series of holes arranged in an apparatus are 32, 20, 16, 
13, 11, 10, 9, 8, and 7 sixty-fourths of an inch in diameter. 
Attempt to hold metallic needle in each hole for fifteen 
seconds without touching the sides of the hole. The arm 
is free from all support and the apparatus should be 
placed so that the forearm makes an angle of 100 degrees 
with the upper arm. *? 

(b) Holes 1%, of an inch in a brass plate set at an angle of 45 
degrees to the subject. Metal stylus to be put into each 
one of the holes without touching the sides. Score is 
number of taps per minute without touching sides. +3 


sion of movement.4® Ten marks 


hat they are not in line in any di- 
bout 


5. Aiming. Accuracy or preci 
to hit in succession so arranged t 
rection and yet can be placed in a comparatively small space a 
5 inches square. Thirty trials are given. 

6. The Pursuitmeter.°° Following the oscillations of a watt- 
Meter needle with a pointer operated by hand, like following а 
traveling object with a gun. Arrangement for recording accumu- 


4 Е 
4c Whipple, G. M, op. cit. 
aryr heb Evelyn, op. cit. 
no hipple, G. M., op. cit. 
19 arfiel, Evelyn, op. cit. 

Whipple, С. M., ор. сй. 


162 The Status of Measurement in Physical Education 


lating errors on a revolving drum. The reactions of the watt-meter 
needle vary every three seconds. Experiments last five minutes. 


B. Large Muscle Adjustments. 


1. Pursuit Pendulum.?! Pendulum discharging water in small, 
stream. Water is caught at each swing in 34-inch diameter cup and 
measured to indicate neuromuscular control at end of twenty-five 
swings. Advantages claimed for test are eye-hand coordination, 
quickness, precision and steadiness of movement. Engages interest. 

2. Ataxiameter.?? Apparatus measuring and recording the 
swaying of body when individual tries to stand motionless. Harness 
attached to head to which is attached recording device for measuring 
anteroposterior and lateral sway of body. 

5. Target Теѕё.53 “The apparatus and method used in this test 
follow the general idea of the ‘three hole test'." The subject stands 
with her left foot between cleats and makes a full fencing charge 
with a foil toward a target on her right. This target is triangular 
in shape and contains three points to hit in the angles of the triangle. 
The center of the triangle is placed on a level with the center of the 
subject's right shoulder. Another target is placed directly over the 
right shoulder “at such a height that the tip of the foil can reach it 
with about 3 inches to spare. . . . Targets consist of copper discs, 
4 inches in diameter and 2$ inch deep. Surrounding and separate 
from the targets is a field of galvanized iron. The latter and the 
targets are separately wired up with the foil so that contact with 
the targets rings a bell, and contact with the galvanized iron sounds 
a buzzer. . . . The subject is directed to lunge at the target to 
measure the distance. She is told that two factors count: speed and 
accuracy, and that she is to make a hundred thrusts at the targets, 
alternating between the targets on the triangular standard and that 
overhead, and taking the former in clockwise rotation. She then 
assumes an erect position with the tip of the foil pointing toward 
the lower left target. The observer gives her the word to start and 
at the same time starts the stop watch. Throughout the test the 
observer counts the thrusts, tallying hits with pencil and paper. 
50Miles, W. R., “The Pursuitmeter,” Jr. of Exper. Psych., IV (April, 1921), 77. 
51 Miles, NAR "Pursuit Pendulum," Psych. Rev., XXVII (September, 1920), 361. 
5? Miles, W. R., "Static Equilibrium as a Useful Test of Motor Efficiency,” Jr. of 

Indust. Hyg., III (February, 1922), 316. 


53Collins, Vivian D. and Howe, Eugene C., “А Preliminary Selecti f 
Fitness," dm. Phys. Educ. Rev., XXIX (December, 1923), 563. н 


Motor Ability, Capacity and Educability 163 


The watch is stopped on the hundredth stroke. The score index 
is obtained by dividing the number of hits by the time. 

“This test was planned to measure the accuracy of a rapid series 
of coordinations involving a large assortment of fine and coarse 
musculatures and depending, to a certain extent, upon general 
strength and endurance.” 
| 4. Balance Tests. 4. A test proposed by Collins and Howe?* 
is designed to measure the reliability of the subject's tactual and 
kinesthetic sense and the coordination of her neuromotor response. 
The apparatus consists of a frame balanced on two hinges at the base 
on which the individual stands. The two hinges at the top of the 
frame are attached to a bracket screwed to the wall. The subject, 
in standing on the base of the frame, takes hold of two handles 
capable of motion in any direction. She is then told to close her 
eyes and the operator starts two motion adders which record the 
movement of the frame on its supports, that is, upward motion of 
each side of the frame. In other words, the individual attempts to 
balance on this frame for one minute and her non-balancing is 
recorded in total in units for comparison with other subjects. 

B. In order to improve balance beam tests as measures of dynamic 
balance, the Springfield Beam-Walking Test’ presents standardized 
procedures. The equipment includes nine 41-inch high oak beams 
each ten feet long, but of different widths, M, 1$, 1, 1%, 2, 22, 5, 
314 and 4 inches. The beams are calibrated in quarter lengths, and 
an individual's score is the value of the segment in which his second 
fall off occurs. A heel to toe step is required, and foot length de- 
termines where an individual starts on each beam. Four trials 
yield the best reliability for adults. Formal validation studies of 


this procedure have not been reported. 

C. Bass? reports an extensive study whose purpose was to devise 
reliable tests of both static and dynamic balance and to determine 
the components of balance which the tests measured. Her test of 
Static Balance requires sticks 1 inch X 1 inch X 12 inches, and a 


stop watch. Three groups of positions are timed; the first, standing 


56Bass, Ruth I., “An Analysis ‘of the Components of Test: 
Dynamic Balance,” Res. Quart. Ат. Assoc. for 


164 The Status of Measurement in Physical Education 


on the ball of one foot crosswise on the stick; the second, standing 
on the floor on the ball of one foot; and the third, standing on the 
ball of one foot lengthwise on the stick. For each group the subject 
is timed standing straight with eyes open, standing straight with 
eyes closed, bending with hips at right angles with eyes open, and 
bending with hips at right angles with eyes closed. The score is 
the number of seconds the position is held. Reliability coefficients 
for the items of this test range from .721 to .901. Adequate validat- 
ing criteria are lacking, but these tests correlated approximately .50 
with sensory rhythm and general motor ability tests. 

D. The Stepping Stone Test of Dynamic Balance requires a diagram 
painted on the floor which consists of eleven circles 814 inches in 
diameter placed according to a standard zigzag pattern. The sub- 
ject leaps from circle to circle on alternate feet landing on the ball 
of the foot, and attempting to stay at least five seconds in each 
circle so his over-all time will approach the top standard of fifty 
seconds for the trip. The score is the total time for the trip plus 
fifty minus three times the total number of errors which are counted 
for such digressions as hopping to maintain balance within a circle, 
sliding, touching heel to floor, touching opposite foot to floor, 
missing circle and the like. Reliability for this test is reported as 
r — .952. The test correlates .739 with the rhythm judgment cri- 
terion and .687 with the motor ability criterion. 

5. Test of Foot Speed." The individual runs in place on a 
platform that is so wired as to record each step. The number of 
steps taken in thirty seconds serves as the index. } 

6. Picking Up Paper.58 A piece of writing paper 11 inches long 
is folded down the middle and made to stand on the floor. The 
experimenter stands in front of the open part of the paper and says, 
‘Now hold your right toe in your left hand (crossing the foot behind 
the body) and pick up the paper with your mouth. You will be 
allowed enough time to do it, but speed counts. If you lose your 
balance or throw over the paper, stand up, right the paper and begin 
again.” Score equals correct if accomplished in sixty seconds or less 
with two falls or less. 

7. Target Test.99 The subject stands 12 feet away from a 
circular target on which are concentric circles of 30-, 20-, 11-, and 
57Garfiel, Evelyn, op. cif. 


58 Ibid. 
59 Ibid. 


Motor Ability, Capacity and Educability 165 


l-inch diameter. A tennis ball is then thrown at the target five 
times and the number of points scored by the five throws serves as 
an index of accuracy. Hits are counted as follows: 


30-inch circle. ... 1 point 11-inch сисЇе............ 5 points 
20-inch circle.... 5 points Bull’s eye, l-inch circle.... 10 points 


Included in the list of single tests just cited are a number which 
may be valuable in determining various single and complex capac- 
ities. Some are adapted to testing the ability of the small-muscle 
groups and some to the testing of certain large-muscle groups. 
Some have been devised for the purpose of determining the relation- 
ship of physical capacity to mental capacity and some in an attempt 
to find a single test by which we can measure general physical ability. 
This quality, however, like meníal ability seems too complicated 
in nature to permit of measurement by a single test. 

'The field of physical education is particularly interested in the 
efficiency of the large or “‘big-muscle” groups and hence will prob- 
ably discard tests of the small-muscle groups as unsuitable for our 
purposes. Before this is done, however, a reasonable amount of 
experimentation with the most prominent should be conducted to 
discover their relationship to "big-muscle" ability. The experi- 
ments of Collins and Howe?? with nine of these do not indicate 
likelihood of finding a very marked correlation between any one of 
them and general motor efficiency, at least, with homogeneous 
groups. This conclusion, however, is based upon a very limited 
amount of data (tests on only 24 individuals). 

A more recent study by Seashore?! confirms their findings, how- 
ever. He compared the results from a large number of tests of fine 
motor abilities and gross motor abilities. His conclusions were: 

No over-all or general positive dependence or interrelatedness of 
fine motor abilities and gross motor abilities has been found. 

2. No evidence appears that within the scope of the tests used 
herein and in similar experiments, any "HE fine and gross 
motor abilities have any positive relationship of dependency. It is 
plausible that the finer and grosser aspects of control of postural 


position may prove to be a factor. i 1 
3. Some activities which upon superficial analysis are called large 


muscle activities do involve finer motor coordinations. The problem 


60Collins, Vivian D. and Howe, Eugene C., op. cit. D 
91Seashore, Harold G., “Some Relationships of Fine and Gross Motor Abilities, 


Res. Quart. Am. Assoc. for Health, Phys. Edut., and Rec., Vol. 13, No.3 (October, 
1942), 259-274. 


166 The Status of Measurement in Physical Education 


for research, then, is not one of looking for dependency of fine and 
gross coordinations, but rather is one of determining the respective 
roles of all the ability variables which are contributory to the whole 
activity. 2 


Selected References 


Arwett, WILLIAM O. and ELBEL, EDWIN R.: "Reaction Time of Male High 
School Students in 14-17 Year Age Groups,” Res. Quart. Am. Assoc. for Health, 
Phys. Educ., and Rec., Vol. 19, No. 1 (March, 1948), 22-29. 

Reports a study exploring the problem of the age at which maximum efficiency 
in certain neuromuscular skills is reached. 

Brace, Davin K.: Measuring Motor Ability, New York, A. S. Barnes and Com- 
pany, 1927. Pp. xvi and 138. 

Brace’s Scale of Motor Ability Tests consists of 20 tests of the stunt type scored 
by the pass or fail method. This battery is one of the earliest scientific studies 
in the area. 

Burey, Luoyp R.: “A Study of the Reaction Time of Physically Trained Men,” 
Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 15, No. 5 (October, 
1944), 252-239. 

Indicates findings of a study on the differences in reaction time of athletes and 
nonathletes, and athletes in different sports. Includes a bibliography of related 
researches. 

Cozens, FREDERICK W.: The Measurement of General Athletic Ability in College 
Men. Eugene, University of Oregon Press, 1929. Pp. 71; Cozens, Frederick W.: 
Achievement Scales in Physical Education Activities for College Men, Chapter V. 
Philadelphia, Lea and Febiger, 1936. Pp. 118. 

Since the original formulation of the General Athletic Ability Test for College 
Men, one test item has been changed and new scoring scales have been con- 
structed. The development of the test is described in detail in the first reference 
and the test as now used is outlined in the second reference. 

EsPENSCHADE, ANNA: "Development of Motor Coordination in Boys and Girls," 
Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 18, No. 1 (March, 
1947), 30-45. 

Reports a study which used the Brace Motor Ability Test as an instrument to 
show the effects of growth and maturity on components of motor coordination 
such as agility, balance, flexibility, strength and control 

GanriEL, EVELYN: The Measurement of Motor Ability. Archives of Psychology, 
No. 62, April, 1923. Pp. 47. 

Here will be found a number of tests of coordination of both the small and 
large muscle groups. This is one of the first strictly scientific measurement 
studies in physical education. 

McCtoy, CHARLES H.: "A Preliminary Study of Factors in Motor Educability,” 
Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. XI, No. 2 (May, 
1940), 28-39. t 

Defines motor educability, discusses its components, describes studies which 
have been made and findings reported, and outlines further research which is 
needed in this area. 

SEASHORE, HAROLD G.: “Some Relationships of Fine and Gross Motor Abilities,” 
Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 13, No. 3 (October, 
1942), 259-274. 

In addition to presenting interesting findings on the relationship between fine 
and gross motor abilities, a description is given of a variety of fine motor ability 
tests, and a complete bibliography included. 


92 рга. 


CHAPTER IX 


Physical Fitness and 


Motor Fitness Tests 


Physical Fitness and Motor Fitness Defined. Physical fitness 
has been a consistently accepted objective of physical education 
throughout the history of the field. Not a little of the difficulty in 
a has been due to the lack of a concise and 
generally accepted definition of the term physical fitness. The 
problem is further complicated by lack of understanding of the 
degree of interrelationships among its components. The recently 
published report of the Subcommittee of the Baruch Committee on 
Physical Medicine! serves to clarify, though by no means solve, 


problems of measurement of physical fitness. 
that "physical fitness is a complex 


The committee recognizes 
concept difficult to define and more difficult to measure, but one 
which jn its most useful form must evaluate the total individual.” ? 
hat “the only final test of fitness 


It is the committee’s concept t 
seems to be the ability to perform the task desired without undue 


fatigue or exhaustion, and the qualities making this possible are 
those of the total personality.” * In other words, “Physical fitness 
describes the functional capacity of the individual for a task. It has 
no real meaning unless the task or job for which fitness is to be judged 
is specified.”* The committee emphasized the futility of attempting 
to apply one set of criteria of physical fitness to the varying types of 
individuals with their multiplicity of activities. Seven broad groups, 


measurement in this are 


1Darling, Robert C., Eichna, Ludwig, W., Heath, Clark W. and Wolff, Harold G., 
“Physical Fitness — Report of the Subcommittee of the Baruch Committee on 
Physical Medicine," Jr- Am. Med. Assoc, Vol. 136, No. 11 (March. 15, 1948), 
764—767. 2 Jhid, p. 765. 3 Ibid, p. 764. 21014, р. 767. 
167 


168 The Status of Measurement in Physical Education 


not even homogeneous within themselves, are listed by the com- 
mittee to emphasize the range of the problem of measuring fitness 
in terms of a specific task. This list includes: 


1. Athletes, persons of school and college age and soldiers and 
sailors in training, considered as athletically occupied. Various 
degrees and kinds of fitness for different sports may require emphasis 
on differing aspects of physical fitness even in this sphere. ' 

2. Laborers, including those factory workers, farmers and miners 
who are required to do strenuous physical tasks. 

3. Sedentary workers, including the white collar class, profes- 
sional groups, business executives and supervisory factory workers. 

4. Workers in specialized duties, such as airplane pilots, police- 
men, locomotive engineers, submarine personnel and other special- 
ized personnel of the armed forces. 

5. The functionally handicapped (otherwise healthy). Неге 
would be included the orthopedically handicapped and special 
subgroups such as the blind, the deaf and the mentally deficient. 

6. Convalescents, which might include both those recovering 
from acute self-limited infections and those with more chronic 
debilitating diseases. 

7. The aged, including those suffering from changes attributable 
to age, without specific disease. 

The first four of these groups contain those who are presumably 
well and healthy; the last three groups concern those that present 
more distinctly medical problems. 


Even in the limited fields of athletic fitness and fitness for hard 
muscular work the picture presented by previous workers is not 
simple. The following list is of the four main aspects of fitness 
emphasized in this group. A similar approach to the subject should 
preface the study of fitness for any task. 

1. General cardiorespiratory, vascular and neuromuscular fitness: 
This concerns the ability to perform work requiring the more stren- 
uous use of large muscle groups and placing stress on the heart and 
lungs, the blood supply to the muscles and the internal economy of 
the body. 

2. Skill and agility: This involves the smooth integration of the 
neuromuscular system for either delicate or gross tasks. In part, it is 
an inborn characteristic; in part, it is the result of past experience. 
All tasks require some degree of this quality. It includes not only 
skill in an accustomed task but also rapid adaptability to new tasks. 

3. Special strengths: Actual physical strength for specific tasks. 
The stoutness or mass as well as quality of bone, muscle and joint 
here come into consideration. 

4. “Motivation”: By this we mean the goal which a man seeks 
consciously or unconsciously by his efforts, his “will to do” and “will 
to succeed.” It is so intimately linked with physical fitness, however 


Physical Fitness and Motor Fitness Tests 169 


classified, that it should be considered in any investigation of physi- 
cal fitness. 

These four subdivisions present only a sketchy outline of the as- 
pects of athletic fitness. Different aspects of fitness are brought out 
by exercise of short, moderate or long duration, although endurance 
to severe exercise of several minutes’ duration and that to more 
moderate exercise of long duration, such as mountain climbing, have 
great similarity. However, the ability to expend a single sudden 
burst of energy would appear to involve different processes than 
those exercises leading to gradual exhaustion. Resistance to fatigue 
has not been mentioned in these subdivisions since it comes in all. 
The factors leading to fatigue and the “breaking point” are nonethe- 
less important and worthy of separate consideration. Likewise, the 
study of physical constitution and body build enter into the four 
aspects mentioned. 

The subdivision motivation, although placed last in the list, might 
well be considered first in most practical situations, especially in the 
interpretation of laboratory tests. Universal experience has been 
that the physical could not be adequately dissociated from the 
psychologic aspects. In fact, the outcome of even simple and rather 
artificial tests of physical performance is often determined by various 


psychologic factors lumped together under the term “motivation.” 
f course, motivation is often quite different ina test situation than 


in the tasks of daily life. 5 

The committee emphasizes the dangers of oversimplification of 
the problems of measurement of physical fitness, and goes on to 
state: “Moreover, it is unwise and indeed impossible, to define 
standards of physical fitness at the present time. These negative 
ans invalidate the concept of physical fitness; 
ful purpose of indicating just where knowl- 
preventing conclusions on false 


conclusions by no me 
rather they serve the use 
edge of the subject stands, thus 


premises." 9 

In the place of simple tests th 
of principles which should govern 
guidance. 

Among the general principles set forth, these are outstanding: 
(1) The orthodox medical examination and medical history shoul 
not merely discover defects but also should consider the potential 
performance of the patient in spite of his. defects and include the 
reactions of the patient to his daily activities. (2) Several test 
exercises severe enough to place considerable stress on the subject 
should be used as an adjunct to the medical examination to assess 


5Darling, Robert C., et al, ор. cit P: 764-765. 
S Ibid., p. 765. 


e committee recommends an outline 
the approach to physical fitness 


170 The Status of Measurement in Physical Education 


the capabilities of the cardiovascular-muscular system. (3) Through 
“An orderly evaluation of the patient from his history, his conversa- 
tion, his general attitude and purposes in life his reactions to simple 
situations” judgments should be formed as to “the total individual, 
his will to do, his motivation, and the factors behind." 7 


This review of the Baruch Subcommittee Report should assist the 
student in understanding the scope and importance of the physical 
fitness objectives of physical education, the measurement problems 
related thereto, and the relationship of separate tests to the total 
problem. The need to recognize the limitations of a single index as 
a measure of fitness is again emphasized. 

The remainder of this chapter is primarily concerned with motor 
fitness tests and indices, particularly those developed during the 
period of World War II. It includes also a brief review of earlier 
worl related to this area. Other chapters, including those on 
Athletic Achievement Tests and Scoring Scales, Cardiac Functional 
Tests and Measurement of General Qualities, deal with additional tests 
whose stated purposes are to measure aspects of physical fitness or 
physical condition. In the last analysis, however, in terms of the 
broad definition of physical fitness proposed by the Baruch Subcom- 
mittee, achievement in tests of all types is useful in the total fitness 
appraisal of the individual. For example, according to the Baruch 
Report with its emphasis on defining fitness in terms of a specific 
task, fitness for the varsity tennis squad would include the possession 
of the skills of tennis, as well as the physical endurance and psycho- 
logical set necessary to withstand the rigors of competition. 

In terms of the Baruch Report a simple classification of physical 
fitness measures might be: (1) medical fitness — measures of the 
standard health examination; (2) motor intelligence — measures com- 
monly called motor ability, capacity, and educability; (3) cardio- 
vascular fitness — measures of circulatory-respiratory condition; 
(4) motor fitness — measures of muscular and neuro-muscular con- 
dition; (5) skills fitness — measures of specific activity skills and all- 
around ability. 

In view of the emphasis on “motivation” in the Baruch Report 
also to be included in the listing are: (6) knowledges — measures of 
informations; (7) attitudes and habits — measures of behavior and 
belief patterns; (8) character — measures of ethical behaviors; and 


7Editorial: “Physical Fitness as Physicians See It," Jr. dm. Med. Assoc., Vol. 
136, No. 11 (March 13, 1948), 771. 


Physical Fitness and Motor Fitness Tests 171 


(9) other psychological measures — including those of emotional pat- 
terns, social adjustment, and the like. 

Motor fitness, as used in this chapter, refers to those measures 
whose basic components include such factors as strength, speed, 
agility, endurance, power and balance, and whose primary purpose 
is to determine the fitness of the body for strenuous work. Difficulty 
arises when one undertakes to classify tests according to purpose, 
name or content. In the first place, terminology in the field is 
neither consistent nor precise. Also, many tests combine components 
from the different categories listed above, e.g, some tests labeled 
physical fitness combine elements of motor fitness and cardiovascular 
fitness. In addition, some tests purport to measure given qualities 
on the basis of assuming relationship between various qualities. 
some of the motor performance tests purport to 
measure qualities of motor fitness on the assumption that achieve- 
ment in these activities (jumping, throwing, speed and distance 
running, etc.) presuppose possession in a measurable amount of the 
basic elements of motor fitness (power, strength, endurance, speed, 


etc.). The general basis, however, which distinguishes available 


motor fitness tests from athletic or motor performance tests is the 
degree of learned skill required to perform the battery items. Never- 
theless, some of the motor performance tests described elsewhere 
in this text are closely related to motor fitness test batteries, and 


may serve related purposes. 


For example, 


Recent Fitness Measures 
s Tests of the Armed Services. Testing was 


given considerable attention in the physical fitness programs of the 
Armed Forces during World War II. In view of the immediately 
preceding discussion of the broader concept of physical fitness it 
should be noted that while the total fitness problem, including 
medical, cardiovascular-muscular, and psychological fitness was 
well considered in the Armed Forces, the term physical fitness 
programs in its common usage during this period generally referred 

Similarly, many of the civilian 


to physical training programs. Эш 
physical fitness programs Were principally concerned with physical 


education activities. The term was used in a popular sense and did 
not necessarily reflect lack of appreciation for the broader ramifica- 
tions of the term. Most of the so-called physical fitness tests 


Physical Fitnes 


172 The Status of Measurement in Physical Education 


developed in this connection are actually motor fitness tests, as 
previously defined. 

Tests in the Armed Forces were primarily directed toward 
determining: (1) physical (motor) fitness of men for duty, including 
fitness to return to duty after convalescence, and (2) effectiveness 
of the physical training programs. The development of adequate 
tests was subject to two major handicaps: first, the lack of a precise 
understanding of the amount and kind of fitness required for a given 
function, and second, the need for extreme administrative ease, 
because of the large groups to be tested. 

In the development of most of the tests approved statistical 
techniques were used, and extensive research conducted. Also, 
procedures offered for administration of the tests suggest practical 
methods for use of tests with large groups. Only tests of the major 
branches of the Armed Services are described here. Many local 
units, however, developed and used their own tests. Unfortunately 
much of this work did not appear in publication, and data on 
technical background are lacking. Scoring scales are available in 
the sources listed for the tests described. These are not adaptable, 
though, for school use, since the standards were set for a selected 
group of men, and an age level not parallel to school groups. 

The women's services did not make extensive use of tests.9 The 
Women’s Army Corps® proposed a system of progressions in a series 
of exercises designed to increase strength, endurance, agility, coordi- 
nation, balance, flexibility and body control. Some branches of the 
women’s services made limited use of the men’s tests, but because 
of their lack of adaptability to women the practice was restricted. 

Army Air Forces. 10 Sit-ups with feet held; pull-ups; and 300- 
yard shuttle run. 


8Turnbull, Jenny E., “The Physical Training Program of the W.A.V.E.S.,” Jr. 
Health and Phys. Educ., Vol. 14, No. 9 (November, 1943), 470-472. 
WAC Field Manual, F M 35-20; Physical Training. United States War Depart- 
ment. Washington, D. C., U. S. Government Printing Office, 1943. 

®Niles, Donna I., "Physical Fitness and the W.A.C.," Jr. of Health and Phys. 
Educ., Vol. 14, No. 8 (October, 1943), 408-411. 

10Headquarters, Army Air Forces, "The Army Air Forces Physical Fitness 

Research Program,” Res. Quarl. Am. Assoc. for Health, Phys. Educ., and Rec., 
Vol. 15, No. 1 (March, 1944), 12-15. y 

Larson, Leonard A., “Some Findings Resulting from the Army Air Forces 
Physical Training Program,” Res. Quart. Am. Assoc. for Health, Phys. Educ., 
and Rec., Vol. 17, No. 2 (May, 1946), 144-164. 

Stansbury, Edgar B., “The Physical Fitness Program for the Army Air Forces,” 
Jr. of Health and Phys. Educ., Vol. 14, No. 9 (November, 1943), 463-465. 
Training — Physical Fitness Test. A. A. F. Regulation No. 50-10. 


иин чаннар HH 


Physical Fitness and Motor Fitness Tests "TONS 


Navy Standard Physical Fitness Test. 11 Squat thrusts, sit-ups, 
push-ups, squat jumps, pull-ups. 

United States Army Specialized Training Division.!? Squat 
jumps, push-ups, pull-ups, sit-ups, 100-yard run carrying man of 
own weight pick-a-back, 20-second squat thrusts, and 300-yard 
shuttle run. 

United States Army.1® A ten-item battery was used by the Army 
as a basis for determining the effectiveness of its physical training 
program. Items included: pull-ups, 20-second squat thrusts, broad 
jumps, shot-put, push-ups, 75-yard pick-a-back, dodging run, 
6-second run, sit-ups, and 300-yard run. 

At the present time the Army recommends a four-item battery 
for use in conjunction with its physical training program consisting 


of: pull-ups, push-ups, two-minute sit-ups, and 200-yard shuttle 


run.1* 
A “Fit to Fight Test" 15 used by some Army Units included these 


items: rope climb, 50-yard run, hand vault, chins, wall scale, 


running broad jump, running high jump, 
five-hour hike, and swim (150 feet minimum). 
United States Navy Aviation Training Division.1® The Naval 
Pre-Flight Program, dealing with smaller groups of men, undertook 
a more extensive fitness appraisal program. This program proposed 
to determined the status of the cadet on his introduction to the 
program of physical training in order that his progress could be 
d toward the achievement of the pro- 


better directed and motivate 
gram objectives. The general purpose of the tests and measures 
used was to appraise physical condition, as measured by the following 
elements: jump and reach, push-ups, chinning, speed-agility run 
(this test consisted of a 394-foot shuttle course with such obstacles as 
11 Physical Fitness Manual for the U. S. Navy, pP- 17-24. Bureau of Naval Per- 
sonnel, Training Division, Physical Fitness Section, 1945. E. 
!? Douglas, Lowell N., "Some Results of an AST Program in Physical Education, 
Jr. of Health and Phys. Educ., Vol. 15, No. 5 (May, 1944), 254+. Tr of Health 
13Bank, Theodore P., “The Army Physical Conditioning Program”, Jr. of Health, 
and Phys. Educ., Vol- 14, No. 4 (April, 1945), pp. T E x 
14United States Army, Basic Field Manual, FM 21-20, Physical Training. ar 
De a А Washington, D. C, U.S. Government Printing Office, 1946. 
ASSI e bury. Preston B., "The Waterbury Physical Training Program, Jr. of 
eun, Phor Educ, УВЫ In BOE 1947), 63-66. к: 
16Unit d States Naval Institute. Mass Exercise, Games, Tests, Chapter NI, 
i "Ph. . ТАР . al ot Cadets.” U. S. Navy, Aviation Training Division, 
Office of the Chief of Naval Operations, 1943. (Now available from A. S. Barnes 


and Company, New York.) 


standing broad jump, 


174 The Status of Measurement in Physical Education 


hurdles, pick-up blocks, dodge posts, and scaling wall), Pack and Step 
Tests (the Pack Test was subsequently eliminated from the pro- 
gram), screening posture and foot conditions, measurements of age, 
height, chest circumference, and abdominal circumference. Findings 
were recorded on a standard Naval Aviation Physical Training 
Record Card. An extensive rating system for achievement in other 
physical activities of the program is described in the source listed. 

United States Office of Education. Recognizing the impor- 
tance of motor fitness in the total physical fitness program, the 
United States Office of Education appointed a special Committee 
on Wartime Physical Education for High Schools, and a similar 
Committee for the College Program. These groups prepared two 
publication!7:18 which outlined physical education programs 
designed to contribute to physical fitness of pupils and students as a 
part of the war effort. The fitness test items suggested in these 
manuals include: 


For High School Boys: Push-ups, pull-ups, dips on parallel bars, 
rope climb 15 and 20 feet, bar vault, sit ups, leg lift, forward bend 
hanging half lever, back twist, potato race, jump and reach, standing 
broad jump, running broad jump, running high jump, 100-yard 
dash, 880-yard dash and 440-yard dash. Standards based upon use 
of the exponent plan as suggested by Cozens, Trieb and Neilson ! 9 
are available for most items in this batter. 

For High School Girls: Jump and reach, potato race, soccer 
throw-in, 40-yard free style swim, 20-yard free style swim. 

For College Men: Pull-ups, push-ups, rope climb, parallel bar dips, 
bar vault, sit-ups, 100-yard dash, squat thrusts, leg lifts, leg raising, 
jump and reach, squat jumps, standing broad jump, running high 
jump, 880-yard run. 

For College Women: 50-yard to 200-yard run, continuous squat 
thrusts, pulse rate test, sit-ups, chinning, floor dips, hanging in 
arm-flexed position, shoulder retractions measured by dynamom- 


17Federal Security Agency, U. S. Office of Education, Physical Fitness Through 
Physical Education for the Victory Corps, pp. 74-82. Washington, D. C., U. S. 
Government Printing Office, 1942. 

18Federal Security Agency, U. S. Office of Education, Manual of Physical Fitness 
for Students in Colleges and Universities, Chapter XII. Washington, D. C., 
U. S. Office of Education, 1943. 

19Cozens, Frederick W., Trieb, Martin H. and Neilson, N. P., Physical Education 
Achievement Scales for Boys in Secondary Schools, New York, A. S. Barnes and 


Company, 1936. 


Physical Fitness and Motor Fitness Tests 175 


eters, 20 yard hop, hopping in place, the Humiston Test, run- 
jump-throw test, and posture screening. 

University of Illinois Motor Fitness Tests.2° Extensive 
studies were conducted on motor fitness at the University of Illinois 
during recent years. Efforts were directed, not toward measuring 
specialized sport skills, but rather toward measuring the funda- 
mental motor fitness characteristics which underlie performance in 
physical education activities. Cureton recognizes six components 
of motor fitness, namely, balance, flexibility, agility, strength, power 
and endurance. Factor analysis studies have supported the legiti- 
macy of these six components. Fourteen-item and eighteen-item 


test batteries were built to sample these components. The items 


were validated against a thirty-item о/о” Fitness Inventory. A 


validity coefficient of .872 is reported for the fourteen-item test. 
item Motor Fitness Screen Test include foot 
and toe balance, squat stand, trunk extension flexibility, trunk flexion 
sitting, extension press-ups, man lift and let down, leg lifts and sit- 
ups, medicine ball put, Illinois Agility Run, skin the cat, bar or 
fence vault, chinning, standing broad jump and mile run. Complete 
directions for administering the test, scoring scales and ratings are 
available in the source listed. The eighteen-item test is primarily 
designed for adult men and women, and requires no apparatus. 

A seven-item Short Screen Test is proposed for situations requiring 
greater administrative simplicity- This test includes dive and roll, 
medicine ball put, bar vault, chinning, leg lifts and sit-ups, breath 
holding and man lift. Minimum standards to pass are given. The 
limitations of this test are clearly recognized by the test authors. 
The reduction of the number of items and the pass or fail basis of 
scoring lowers both the validity and reliability. . On the other hand, 
f quick screening fo indicate poorly fit sub- 
maximum performance of individuals in 


poor condition. The total classification plan includes also appraisal 
of body type, swimming ability and posture. — 
Mo i za e ess Test for High School Girls.?! The purpose of 
this motor fitness test is to measure 87 4 Guidance, Ch 1 
j 7 ў s A ter 15 
20Curet Ж. Physical Fitness Appraisal an Guidance, Chap , 
Seton Thomas Кы T Measures of Fitness. St. Louis, The C. V. Mosby 


Company, 1947. 
210’Connon, Mary E. and Cureton, Thoms 2 
School Girls," Res. Quart. Am. Assoc. for He 


No. 4 (December, 1945), 302-314. 


Items in the fourteen- 


K.J]r; “Motor Fitness Tests for High 
lth, Phys. Educ., and Rec., Vol. 16, 


176 The Status of Measurement in Physical Education 


day requirements in handling the body. It aims to measure qualities 
of balance, flexibility, agility, strength, power and endurance. From 
a tentative form of nineteen items two screen tests were prepared, 
one consisting of six items, and one of twelve. Items include the 
following pairs: foot and toe balance and dizziness recovery, trunk 
extension and trunk flexion, kneeling jump and Illinois Agility Run, 
sit-ups and kneeling push-ups, basketball throw and standing broad 
jump, 30-second squat thrust and Brouha step test. 

The test was validated against the criterion of the composite item 
score. The test authors recognize the limitations of this method, 
but justify it in terms of the present practical limitations of accu- 
rately defining motor fitness. Validity coefficients for items selected 
for the final batteries from the original nineteen range from, .415 
to .616. Reliability coefficients range from .70 to .98 with all but one 
exceeding .90. Norms based on a limited sample are available. 

The California Physical Fitness Pentathlon.?? The Cali- 
fornia Physical Fitness Pentathlon was designed for use with boys 
and young men in junior high school through junior college. Its 
purpose is to measure such components of physical (motor) fitness 
as coordination, speed, power, strength, endurance, flexibility, 
agility and balance. 

The items in the battery are divided into five groups as follows: 
Group I, standing broad jump and standing hop, step, and jump; 
Group ІТ: Pull-up, rope climb, push-up; Group III: combination 
75- and 150-yard run, 150, 220 and 300-yard run; Group IV’: bar 
snap for distance, bar or fence vault ; Group V: frog stand, sit-up, 
Burpee Test. For the Pentathlon one event is to be selected for each 
of the five groups. Equivalent scoring scales have been set for each 
event and the sum of the scores in five events can be added to find 
a total score on the Pentathlon. To minimize the effects of size and 
maturity on ability to perform the classification plan of Cozens, 
Trieb and Neilson?? is suggested for use with the Pentathlon. ` 
Standards of performance are provided for each class of the classifi- 
cation plan except for junior college men. А suggested class record 
sheet for recording performances in the events accompanies the 
22“The California Physical Fitness Pentathlon,” Bulletin of the California Stale 

Department of Education, Vol. XI, No. 8 (November, 1942), Sacramento, 


California. 

23Cozens, Frederick W., Trieb, Martin H. and Neilson, N. P., Physical Education 
Achievement Scales for Boys in Secondary Schools, pp. 10-13. New York, A. S. 
Barnes and Company, 1936. See also this text page 117. 


Physical Fitness and Motor Fitness Tests 177 


scoring scales. The Pentathlon was also recommended as a basis 
for interschool and intramural competition. 
Indiana University Motor Fitness Indices for High School 


and College Age Men. In an effort to establish a test of motor 


fitness which was valid and also met the criterion of administrative 


economy, Bookwalter has developed four indices based on five 
simple test items. These include: 


Motor Fitness Index I (chins + push-ups) vertical jump. 

Motor Fitness Index II (chins + push-ups) X standing broad jump. 
Motor Fitness Index III (straddle chins + push-ups) X vertical jump. 
Motor Fitness Index IV (straddle chins + push-ups) X standing broad jump. 


hich these indices were validated 
eth, velocity, motor ability, 


e indices with this criter- 
"24 


_ “The 12-item criterion Against wi 
involved two or more measures of stren 
and endurance. The validities of the abov 
ion were .859--.01, .818+.01, 841 2.01, .8122-.01 respectively. 

'The indices must be computed on the basis of T-scores not raw 
scores. The source indicated includes T-score tables for this purpose, 
and temporary norms. It also includes directions for determining 


local norms for the indices. 

Indiana Physical Fitness Tes 
Fitness Test for High School Boys 
battery designed to measure components of motor fitness, and 
selected in light of administrative ease and economy of time. The 
validity of the test was checked against а criterion of twelve highly 
selected and varied motor fitness items (r= 767). Тһе items include 
straddle chins, squat thrusts, push-ups and vertical jump. A phys- 
ical fitness score is obtained by multiplying the sum of the scores on 

n the vertical jump. Norms are 


the first three items by the score © 
У Бр x 
given based upon Classification Index divisions for boys, and Height- 


Weight Class divisions for girls. h 

Norms for this test have also been developed for the e ementary 
level, grades four to eight; ето were established for each 
24Bookwalter, Karl W., “Test Manual for Indiana University Motor Fitness 


Indices for High School and College Age Nees pre he fugue! Јо 
Health Phys. Educ. and Rec. Vol. 14, No. 4 (Decem 9 Sel 5 р "s hs 
"State of Indiana, Physical Fitness Manual Pd on ee ie ulletin 

No. 15 SI f Public Instruction, ndiana, » pP. n 
sek 6, pt al Fitness Manual for High School Girt а Ne! 
137 (R. m. DD eu of Public Instruction, Indiana, 2 

26 Fra; M C » PE hsten, N- Gs, “Indiana Physical Fitness Test for the 
ш» C e (Grades | to 8," The Physical Educator, Vol. V, No. 8 
(May, 1948). 


4.25 The Indiana Physical 
and Girls is a four-item test 


178 The Status of Measurement in Physical Education 


of six classification groups. The groupings were determined arbi- 
trarily from Classification Index scores computed from the best 
combination of age, height and weight. 

The Iowa Physical Fitness Battery for College Women. ?7 
The purpose of the Iowa physical fitness battery is to determine the 
work capacity or dynamic physical fitness of college women. The 
battery includes the following items: chair stepping, bounce, 
sit-ups, obstacle race and vertical pull. T-scales for college freshmen 
and sophomore women are given. The test was validated against a 
work capacity criterion of two minutes maximum work output on 
the bicycle ergometer. The acceptable validity and reliability of 
this criterion has been indicated by Tuttle and Wendler.28 A .66 
validity coefficient between the sum of the T-scores and the work 
output criterion was found. Various combinations of four of the 
five battery items reveal equivalent coefficients. 

Yale Motor and Physical Fitness Теѕёѕ.29 To meet wartime 
fitness needs of students, Yale University instituted a required 
physical education program designed to promote health and superior 
physical condition. To direct the individual student's program of 
activities, three types of tests were given, medical, including ortho- 
pedic, swimming, and motor and physical fitness. The motor and 
physical fitness test included the following eleven items: chest 
measurement, hand grip, push and pull, fence vault, chin-ups, 
standing broad jump, vertical jump, trunk raising, dips on parallel 
bars, rope climb 24 feet, and Brouha Step Test. The test was not 
validated by statistical means. However, the items were selected on 
the basic fitness needs of men entering the services as determined by 


27Mohr, Dorothy R., “The Measurement of Certain Aspects of the Physical 
Fitness of College Women,” Res. Quart. Am. Assoc. for Health, Phys. Educ., and 
Rec., Vol. 15, No. 4 (December, 1944), 340-350. 
Scott, M. Gladys and Wilson, Marjorie, "Physical Efficiency Tests for College 
Women," Rer. Quart. Ат. Assoc. for Health, Phys. Educ., and Rec., Vol. 19, No. 2 
(May, 1948), 62-69. 
Scott, M. Gladys, Mordy, Margaret and Wilson, Marjorie, "Validation of the 
Mass-Type Physical Fitness Tests with Tests of Work Capacity," Res. Quart. 
Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 16, No. 2 (May, 1945), 
123-138. 

28Tuttle W. W. and Wendler, A. J., “The Construction, Calibration, and Use of 
an Alternating Current Electrodynamic Brake Bicycle Ergometer,” Jr. of Lab. 
and Clin. Med., Vol. 30, No. 2 (February, 1945), 173-183. 

29 Murphy, Thomas W. and Wickens, J. Stuart, “Yale University Completes One 
Year of Its Wartime Physical Training Program,” Res. Quart. dm. Assoc. Jor 
Health, Phys. Educ., and Rec., Vol. 14, No. 3 (October, 1943), 333-340. 


Physical Fitness and Motor Fitness Tests 179 


a study of military fitness programs and the opinion of military 
authorities responsible for conditioning programs. Group averages 
of Yale Freshmen became the minimum standard for each test. 

City College of New York Program of Health and Physical 
Fitness Evaluation.?? City College’s health and fitness appraisal 
program includes six phases, medical examination, strength test, 
motor ability tests, personal hygiene inspection, swimming ability 
tests, and comprehensive written health test. The three physical 
capacity items — strength, motor ability and swimming, are meas- 
ured as follows: The Rogers Physical Fitness Index; a battery 
consisting of a sprint, baseball throw, broad jump, and jump and 
reach; and ability to swim 60 feet. Programs such as these indicate 
a trend in the field to consider the over-all fitness appraisal of indi- 
viduals, but also illustrated the need for much additional experi- 
mentation and research to validate tests used, and to determine the 
relationships among the components of physical fitness. 

The Andover Physical Fitness Testing Program. The 
Andover Physical Fitness Testing Program takes cognizance of 
three aspects of fitness: medical fitness, functional fitness, and moor 
fitness. Medical fitness is appraised by a complete health examina- 


tion. Functional fitness is measured by the Step Test technique. 
Motor fitness is tested by а series of activities requiring strength, 
coordination and skill. The battery includes these items: vertical 


jump, rope climb, standing broad jump, fence vault and swimming. 

wimming is tested by requiring each pupil to demonstrate his 
ability in the front dive. back stroke, breast stroke, side stroke, free 
style and to swim 100 yards. Ratings are made on a 5-point scale. 
Performance other than swimming is equated on the batis La an age, 
height, weight coefficient plan modified from Cozens et al.°* Scoring 


schedules are given. 
While data on the vali 

are not indicated, it does present а 

broader aspects of physical fitness. 


‚ of tests used in this plan 


ity and reliability i 
а practical attempt to appraise the 


of Health and Physical Fitness 


S9Eberhardt, Charles J.» ^A. Cile т 16, No. 7 (September, 1945), 


Evaluation," Jr. of Healli 
, Montville E. and Gallagher, 


spp e Quart. A 

ohnson, T., Schubert, 2 г Testing Program,” Res. Quart. Ar 

à ZU ver Physical Fitness | (March, 1944), 16-22. 

Peel, er e duc. and Rec. Vol. 15, No. 1 (March, | 

See page 117. 
Cozens, Frederick 


W., Trieb Martin H. and Neilson, N. Р., ор. cit. 


180 The Status of Measurement in Physical Education 


Illinois High School Physical Condition Test and Standards 
of Performance.?* Many local programs have selected items 
measuring aspects of motor fitness, and have constructed local 
norms. Typical of batteries of this type is the Illinois High School 
Physical Condition Test issued by the State Health and Physical 
Education Department. The items selected include: measures of 
arm and shoulder girdle muscles—pull-ups; measures of leg strength, 
endurance and coordination — squat jumps; measures of abdominal 
strength — sit-ups; measures of arm and shoulder extension strength 
— push-ups; measures of cardio-respiratory endurance — 1-mile run. 
Standards for boys of each age level, thirteen through eighteen, 
were developed on the basis of 12,000 cases. 


Earlier Fitness Measures 


While wartime emphasis on physical fitness gave impetus to the 
development and use of tests of the motor fitness type, measurement 
in this area was by no means new. As previously stated, physical 
fitness, though variously defined, has long been an objective of 
physical education programs, and as such the literature is rich with 
tests labeled physical effictency, physical fitness, neuromuscular 
fitness and the like. Then too, many of the athletic ability tests or 
general motor performnce atests aim to indicate muscular condition 
as well as athletic skill. These tests, however, are described in 
Chapter V. 

Indices for Measuring Physical Efficiency. For a period of 
thirty-five years attempts have been made to solve the question of 
health, physiologic condition, muscular condition, etc., by putting 
together in a happy combination a series of very simple measure- 
ments and obtaining an index figure which could be applied to an 
individual. Though these attempts have not all proved their worth, 
they form interesting material for thought. Some of the more im- 
portant indices are listed, 

І. Oppenheimer’ s Scale. This scale for measuring general physio- 
logic condition with emphasis on nutrition is given by Williams.?? 


Coefficient of vital efficiency = Girth upper arm x 100 . 
irth chest in expiration 


34]ll/nois High School Physical Condition Test and Standards of Performance. 
Office of Public Instruction, Health and Physical Education Department, 
State of Illinois, September, 1944, (Mimeographed). 

35Williams, J. F., The Organization and Administration of Physical Education, 
p. 255. New York, The Macmillan Company, 1923. 


\ 


Physical Fitness and Motor Fitness Tests 181 


Measurements are taken in centimeters. 

The standards set are: excellent — 29 and above; good — 26 to 28; 
poor—less than 26. 

2. Pignet’s Formula. This formula for estimating muscular 
strength was devised in 1901 and was reported by Martin, 1921. 
The weight in kilograms and the chest measure at expiration in 
centimeters are determined and added. This sum is then subtracted 
from the height in centimeters; the result is an index number whose 
size is taken as an index of physical efficiency according to the 


following table: 


Index number = Height — weight +chest measure at expiration 
(Measurements in centimeters and kilograms) 


Index number Efficiency rating 


Under 10 Very strong 
10-15 Strong 
15-20 Good 
20-25 Fair 
25-30 Weak 
30-35 Very weak 

Above 35 Useless 


Martin’s note: “This is really not of proven value though con- 
sidered exceptionally good by many army medical men. However, 
its validity must be verified." 

In extensive studies on university students the formula proved 
of little worth. 3% 

3. The Assessment of Physical Fitness." Because of the fact that 
definite relationships exist between the weight of the body, the length 
of the trunk (sitting height) and the circumference of the chest, 
as well as the relationships of these to the vital capacity of the lungs, 
these four measurements have been used to determine physical 
fitness, good physique and healthy development. The tables given 
by Dreyer and Hanson have been calculated trom measurements 
taken on a large number of normal, healthy men and women and 
give information previously mentioned, that is, the weight of the 
human body in its relationship to the length of the trunk, the cir- 
cumference of the chest and the vital capacity of the lungs. 


39Dawson, Percy M., The Physiology of Physical Education, p. 750. Baltimore, 


The Wilkins and Wilkins Company, 1935. К › 
37Dreyer, Georges and Hanson, G. F., The Assessment of Physical Fitness, p. 115. 


London, Cassell and Company, 1920. 


182 The Status of Measurement in Physical Education 


4. Rogers’ Physical Fitness Index.?9 Rogers’ strength index has 
already been mentioned and the scheme by which it is calculated. 
“The physical fitness index is the achieved strength index of any 
individual divided by the normal strength index for the individual’s 
age and weight.” A table of norms for age and weight is provided 
by Rogers. After the division the index is multiplied by 100 to 
eliminate decimals and scores above or below 100 indicate the 
percentage above or below the normal. 

The Use of Vital Capacity. Vital capacity, which many have 
thought was valuable in measuring size of body, strength or flexi- 
bility, has been discussed by Cureton?? after an intensive study of 
the literature and causal analysis of vital capacity. A partial list 
of his conclusions are presented here: 


1. The bulk of vital capacity is due to size as contributed through 
the height and weight factors. 

2. Strength or strength condition is a very small factor as it con- 
tributes to vital capacity. 

3. Vital capacity varies concomitantly with the growth of the 
body. Therefore, factors which affect growth will affect vital 
capacity. 

4. There may be a slight possibility of the contributing effect of 
the Athletic Index having a small circulatory-condition factor but it 
is thought that this is due more to flexibility of a type unmeasured, 


It would seem logical to conclude from these statements that the 
inclusion of vital capacity in any index has little, if any, justification. 
This view is substantiated in a study by Van Dalen*? where it is 
shown that “lung capacity is of little significance as an element in 
the strength test... except as both are correlated with age, 
height and weight." Furthermore, “its addition makes no notable 
increase in the validity of the strength test itself, and so it may well 
be omitted from this battery.” 


38Rogers, Frederick Rand, Tests and Measurements Programs in the Redirection 
of Physical Education, p. 152. New York, Bureau of Publications, Teachers 
College, Columbia University, 1927. 

39Cureton, Thomas K., Jr., "Analysis of Vital Capacity as a Test of Condition for 
High School Boys,” Res. Quart. Am. Phys. Educ. Assoc., Vol. VIL, No. 4 
(December, 1956), 80-92. It should be noted that vital capacity refers to the 
maximal volume of air which can be expired after a full inspiration. 

40Van Dalen, Deobold, “The Contribution of Breathing Capacity to the Physical 
Fitness Index," Res. Quart. dm. Phys. Educ. Assoc., Vol. VI I, No. 4 (December, 
1936), 93-95. 


Physical Fitness and Motor Fitness Tests 183 


Sargent’s Test of Speed and Endurance. The aim of this 
test was to measure speed and endurance — the demand of that 
period. The test which Sargent*! devised to measure these elements 
of speed and endurance was in the nature of six simple exercises: 


Elbows to knees from a supine position. 

Pulling up part of the body weight. 

Pushing up part of the body weight. 

Bending forward, touching the fingers to the floor. 

Rising on the toes. 

Sitting on the heels and returning to a standing position. 


сою жо ют 


These exercises were continued for a period of thirty minutes 
without rest, and the survivors were considered to be efficient 
physically. “This new test,” says Sargent, +? “brought to the front 
a different type of man from the one who had come forward in the 
old strength test.” E 

In administering the test, one half of the class was set to watch 
and give the test to the other half on one day and the procedure was 
reversed on the following day. Thus the work of the instructor was 
materially reduced. Just how each individual was rated we have 
not been able to determine. Possibly the rating was based upon the 
number of minutes the individual was able to keép going on the 
exercises. 

University of California Physical Efficiency Test.*? For 
this physical efficiency test after careful medical examination, the 
men were divided into two groups: (1) those showing organic 
weakness, deformity or crippled condition, and (2) those men who 
were physically sound. The latter group was given а physical 
efficiency test, the object of which was to give each man an estimate 
of his relative physical efficiency, and to assign him to activity on 
the basis of his needs. The men were tested in running, jumping, 
vaulting, wall scaling and falling. The elements of the test were 
selected first because of their practical significance with respect to 
the physical emergencies of life, and second in order to point out 
individual deficiences in strength and skill. Those passing each 


41Sargent, Dudley A., “Twenty Years of Progress in Efficiency Tests,” Am. Phys: 


Educ. Rev., XVIII (October, 1913), 452. 


аза, p. 455. КУ. 
#3 lecherser, Frank, “Physical Efficiency as Measured at the University of 


California," Res. Quart. Am. Phys. Educ. Assoc., Vol. III, No. 2 (May, 1932), 
151-172. 


184 The Status of Measurement in Physical Education 


event with a minimum grade of D were allowed to elect the activities 
in physical education they desired. Those failing the test were 
assigned to an activity to increase their efficiency along this line. 
A swimming test and a defense skills test were also included in the 
program. 

This test and the following could be properly classified as athletic 
ability tests, but are described here to illustrate the persistent 
interest of the profession in measuring aspects of physical fitness 
and organic efficiency through physical or motor skills. 

College Freshman Physical Efficiency Test. In order to 
stimulate widespread interest jn physical efficiency, the National 
Collegiate Athletic Association published a freshman physical 
efficiency test which demanded that at least 80 per cent of the 


four events selected were the running high jump, the running broad 
jump, the 100-yard dash and the bar vault. These events were 
chosen because it was felt that they were the best ones for testing 
muscular efficiency and because they indirectly test organic effi- 


Though no attempt was made to set up an objective criterion by 
which this test could be validated, it does represent the judgment 
of a number of experts as to the qualities which should be combined 
to adequately rate a college woman physically. The final score 
represents the woman’s rating in three important phases of physical 
efficiency —a medical test, an anthropometric test and a motor 
ability test. One of the most interesting features of the rating is the 
weight placed on each of these phases. The medical test, containing 
fifteen items, permits a maximum score of 150 points; the anthropo- 
metric test, with four items, makes possible a maximum score of 40 
points; and the eight items of the motor ability test permit a maxi- 


44Wayman, Agnes R., Education Through Physical Education, pp. 289-298, 
Philadelphia, Lea and Febiger, 1925. 
Wayman, Agnes R., “Testing and Scoring the Physical Efficiency of College 
Women,” Res. Quart. Am. Phys. Educ. Assoc., Vol. I, No. 4 (December, 1930), 
74-86. 


Physical Fitness and Motor Fitness Tests 185 


mum score of 80 points. Relative weightings, therefore, are in the 
approximate proportion of 4, 1, and 2 respectively. One natural 
question which will immediately be raised should be in regard to 
this weighting. Statistical evidence will be needed in the solution 
of this problem. 

A number of purposes have been outlined for this battery of tests, 
among which may be mentioned: 


1. It provides a basis for permitting the election of activities by 
students taking required physical education; 

2. It enables an instructor to pick out students in need of special 
attention; 

3. It furnishes a means for classification, comparison, and for 
measuring achievement, and provides an incentive for improvement. 


The Measurement of Organic and Neuromuscular Fitness 
in College Women.*? After a great deal of experimental work in 
physical tests of all sorts, Collins and Howe suggested the following 


scheme: 


Purpose 
To measure organic and neuromuscular fitness in college women. 


F'alidation 

'The scheme was set up empirically after much experimental work 
and observation as to what constitutes organic and neuromuscular 
fitness. Because it is empirically set up does not imply that it is not 


close to the real truth. 


Scoring 

Medical examinations are first given to all subjects. Those 
receiving 50 or more points are considered to be ' medically passed.” 
Those scoring less than 50 points belong to the medically restricted 
group. 


3 


Medical examination 

1.. Defects not likely to entail an immediate limitation of vigorous 
exercise: eyes, teeth, tonsils, digestive tract, etc. (Deduct a maxi- 
mum of four points for uncorrected defects under these headings. 
Return the points according to completeness of correction.) 


45Collins, Vivian D. and Howe, Eugene C., "The Measurement of Organic and 
Neuromuscular Fitness,” m. Phys. Educ. Rev., XXIX (February, 1924), 64-70. 


186 The Status of Measurement in Physical Education 


TABLE VII 
SCORING or ORGANIC AND NEUROMUSCULAR FITNESS IN COLLEGE WOMEN 
Medically Points | Medically Points 
passed* 50 restricted} 18 
Motor control: 
1. Objective 
SWIm qs c. sasaa asas жө aisa 024 024 
Run—50 yards. р ЖОЕ [| — CT ames: 
Jump—standing broad....| 012 (0 1 2) 
Уаш&#—{епсе............. 012 24 (012) 18 
Climb—14^ rope (distance)| 012 
Throw—baseball (distance) 0 1 2 0:1 
2. Subjective 
Posture, carriage.......... 048 048 
Physiometric tests: 
пет егис SOE m 056 
Cardio-vascular (Schneider)..| 024 (0 2 4) 
Strength (Martin)........... 024 18 6 
Balances: ае 012 012 
General coordination. . 012 
Somatometric examination: 
Weight-height relation. ...... 024 024 
Vital Made ene eese os sos uus 02 B 024 8 
Eusomatic index—maximum... . 100 50 


*"Subjects whose medical examination shows no defects requiring a limitation 
of exercise are scored in this column with a 50-point premium.” 

f“ Subjects with medical defects demanding restriction of exercise are scored in 
this column with an initial award of 18 points.” 


2. Defects which restrict subject’s muscular exercise. (Deduct 
- and return points for “minor” defects, as in l, above.) 


The authors of the index (namely, Collins and Howe) have the 
following pertinent comments to make: 


1. Subjects scoring more than 50 points (except in the unlikely 
case of a person receiving zero scores in all the motor, physiometric, 
and somatometric tests, and being peaalized for uncorrected minor 
defects) are automatically recognized as medically passed. Those 
scoring less than 50 points are recognized as belonging to the medi- 
cally restricted group. 

2. The scoring of the tests of objective motor control must be 
based on adequate data from the sex and age group tested. In all | 
cases, except possibly the climb, the area of the curve of distribution 
of data may be divided appropriately, perhaps into thirds, and the 
subjects scored accordingly — e. g., zero, two, or four. In subjective 


Physical Fıtness and Motor Fitness Tests 187 


control the scoring is not easy. No successful subjective method of 
scoring posture has yet been devised. An objective estimate of 
carriage is still more difficult to manage. Possibly the following 
points would have to be considered: position of feet, distribution 
of weight on the floor, thrust of the foot as it leaves the floor, 
sequence of flexions and extensions of the knees, lateral movement 
of the body, rotation of the pelvis, movement of the arms, mainte- 
nance of the interrelation of the position of body segments as in 
correct standing position. 

3. In the somatometric and physiometric tests there are no diffi- 
culties in the way of a provisional division of the data into thirds. 

4. The somatometric examination is reduced to include only 
measurements that bear on efficient endurance and vitality. The 
relative weights of the four chief divisions of the scoring system 
(50, 24, 18, 8 for the medically passed) are intended to indicate the 
probable importance of functional and dynamic tests as compare 
with static somatometric tests. 


Selected References 


Booxwatrer, KARL W.: “What is a Physical Fitness Program for Boys," Res. 
Quart. dm. Assoc. for Health, Phys. Educ., and Rec., Vol. 15, No. 5 (October, 
1944), 240-248. 

Analyzes current practices as a source for determining appropriate pro- 
cedures for the organization and conduct of physical fitness programs for boys 
in high schools and colleges. Discusses objectives for physical fitness, types of 
activities recommended, and elements utilized in measuring and testing physical 
fitness. 

Brock, Joun D., Cox, WALTER A. and PENNOCK, Erastus W.: “Motor Fitness,” 
Suppl. to Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 12, No. 2 
(May, 1941), 407-415. 

Outlines the components and importance of motor fitness. Discusses measur- 
able factors, and lists available tests of motor ability. 

and Kozman, HILDA C.: ""Trends in State Wartime Physical 


Cassipy, ROSALIND 
” Jr. of Health and Phys. Educ., Vol. 14, No. 7 (September, 


Fitness Programs, 


1943), 557-392. Н : 
Shows impact of wartime fitness emphasis on school programs of physical 


education emanating from State Education Departments, and extent of utiliza- 
tion of the Victory Corps Program. 
Cureton, Tuomas K., JR: Physical Fitness Appraisal and Guidance. St. Louis. 
The C. V. Mosby Company, 1947. *Рр. 566. 
Section V, “Motor Fitness," describes components of motor fitness and a wide 
variety of measures. 


Cureton, T. K., JR. 
Selected Bibliography on Physical Fitness, 
Phys. Educ., and Rec., Vol. 14, No. 1 (March, 1943), 112-124. i 

Offers an annotated listing of books published through October, 1942 dealing 
with the following aspects of fitness: General Hygiene, Nutrition, Standards 


and Measurements of Fitness, and Scientific Analysis. 


TURNER, CLAIR E. and Layman, EMMA McCrox: “А 
” Res. Quart. Am. Assoc. for Health, 


m в 


188 The Status of Measurement in Physical Education 


DARLING, ROBERT C., Ercuna, Lupwic W., Heatu, CLARK W. and Worrr, 

` Hanorp G.: “Physical Fitness — Report of the Subcommittee of the Baruch 
Committee on Physical Medicine," Jr. 4m. Med. Assoc., Vol. 136, No. 11 
(March 13, 1948), 764—767. 

Every physical educator should be familiar with the complete report of this 
subcommittee. This source includes-a list of individuals from the physical 
education profession asked to review and contribute to the report. 

Hatt, D. M.: “Motor Fitness of Farm Boys,” Res. Quart. dm. Assoc. Health, 
Phys. Educ., and Rec., Vol. 13, No. 4 (December, 1942), 452-441. 

Describes the 4-H Club physical fitness appraisal program, which is con- 
cerned with physique, organic health and motor fitness. Illustrates use of 
the Wetzel Grid in a total fitness program, and also describes a four-item 
motor fitness test. 

Karpovicu, PETER V. and Weiss, RAYMOND A.: "Physical Fitness of Men 
Entering the Army Air Forces," Rer. Quart. Am. Assoc. for Health, Phys. Educ., 
and Rec., Vol. 17, No. 5 (October, 1946), 184-192. 

Discusses how tests were used in the Armed Service programs, and indicates 
physical status of Air Force personnel as revealed by the Air Force fitness tests. 

Larson, LEONARD: “Some Findings Resulting from the Army Air Forces Physical 
Training Program," Res. Quart. dm. Assoc. for Health, Phys. Educ., and Rec., 
Vol. 17, No. 2 (May, 1946), 144-164. 

Describes how the A. A. F. Tests were built, including philosophy of Armed 
Forces on physical fitness testing, limits on testing, and findings from use of, 
tests developed. 

Supervisory STAFF OF THE STATE DEPARTMENT ОЕ EDUCATION, EDITORS: Tests, 
Standards and Norms for the Oregon Physical Education Program, “Elementary 
and Secondary Schools.” (Multolithed) Salem, Oregon, State Department of 
Education, 1947. Pp. 89. 

Lists components of motor fitness which are the concern of the physical 
education program. Describes tests and ratings of physical status, strength, 
endurance, power, speed, agility, flexibility, balance and basic skill movements 
through which a motor fitness appraisal of public school children can be made, 

STEINHAUS, ARTHUR H.: “Fitness and How We May Obtain It.” Jr. of Health and 
Phys. Educ., Vol. 14, No. 8 (October, 1945), 427-428. 

Outlines six components of total fitness, and indicates the contribution of 
health educators, physical educators, recreation leaders and teachers in guidance 
for the attainment of total fitness. 

WALLER, WILLIAM H.: "Periodical Literature on Physical Fitness," Res. Quart. 
Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 16, No. 1 (March, 1945), 18-25. 

Presents a listing of periodical articles published between September, 1939 
and December, 1943 dealing with the problem of physical fitness. 


CHAPTER X 


Sport Technique Tests 


The majority of school programs in physical education aim in 
part to improve the various possible skills that are involved in 
the entire gamut of physical activities. If a record is to be made 
of this improvement, an accurate measuring stick must be applied 
at definite intervals in the course of the education of the pupil. 
Such a measuring stick, if applied when the instructor knows 
nothing of the ability of his students, will indicate individual and 
group deficiencies in a particular activity and point the way to 
his teaching problem. It will be of further benefit at the end of 
a particular teaching period in indicating the actual results which 
he has accomplished. It may serve readily as a means of com- 
paring groups and of estimating objectively the relative efficiency 
of individual pupils. 
From the time of t 
education curriculum (abo 
large place in the program а 


he general broadening of the physical 
ut 1916), game activities have held a 
nd continued attempts have been 
made to measure the various techniques involved. These at- 
tempts, empirical though many of them have been, have led 
toward a more scientific development of measuring tools in this 


field. 
Fundamentals for Measuring Technique in Sport. In the 


development of testing procedures to measure technique in sports 
these fundamentals must be kept in mind: 

1. The sport must be analyzed by experts in the field to чыш. 
the specific physical techniques which ought to be measured. In a 


8 189 


190 The Status of Measurement in Physical Education 


sport such as baseball, for example, an analysis might be something 
like this: 
(1) Throwing 
(a) For accuracy 
(6) For distance 
(c) For speed and accuracy 
(2) Catching 
(a) Stationary, batted ground balls, fly balls 
(6) While in motion, batted ground balls, fly balls 
(c) Thrown balls 
(3) Batting — hand-eye coordination 
(4) Speed of legs plus ability to change direction quickly 
(5) Sliding — body control 

2. The selection of valid and reliable tests to measure these various 
techniques, procedures and statistical techniques for which are 
described in Chapter XVIII. 

3. Experimentation to determine norms and directions for their 
use. 

4. Definite directions for scoring and administering tests. 

Early Work in this Phase of Measurement. The first use of game 
elements as possibilities for testing in physical education comes in 
the development of the Athletic Badge Tests by the Playground and 
Recreation Association of America in 1913. These tests include: 

(1) Baseball throw for accuracy 
(2) Baseball throw for distance 
(3) Basketball throw for distance 
(4) Volley ball serve 

(5) Tennis serve 

(6) Baseball throw and catch 

(7) Basketball goal throw 


As early as 1916, Reilly introduced into his program of rational 
athletics certain tests which involved the elements of game activities 
such as: 

(1) Baseball pitching 

(2) Basketball goal throw 
(3) Throwing the basketball 
(4) Serving in tennis 

(5) Putting in golf 

(6) Driving in golf 


Sport Technique Tests 191 


These tests were followed by the work of Hetherington in de- 
veloping the California decathlon. New tests which were added 
include: 


(1) Punting a football for distance 
(2) Soccer kick for goal 

(3) Running and catching 

(4) Tennis and volley ball serving 


The work of the motor ability committee of the American Physical 
Education Association has been briefly mentioned in Chapter V. 
An important portion of this work relates specifically to game 
activity tests, the aim of which “is to break up the team games into 
interesting, teachable, measurable units, which can be used for mass 
class exercises on a limited space, without expensive equipment.” 
To illustrate the possibilities for setting up objective tests in a par- 


----3" Division Ват 


sel 1 ^I Sra -Section C 

ewe ----2" Division Bar 

soll Eb -Section B 
|-------. Section A 


Fig. 7. Football goal for passing and catching practice. 


tests for rugby football may be cited. A set 
of football goal posts (with cross-bar set at 136 inches, width 18 feet, 
6 inches) is utilized for passing and kicking practice by having the 
space between the cross-bar and the ground divided into a number 
of openings or windows. Two 3-inch strips run from the bar to the 
ground and divide the area into three vertical sections. The lower 
sections 14 inches from the ground and 19 inches high are used for 
center spiral passes in which the receiver either runs or passes. The 
second sections just above the lower sections and extending upward 
39 inches are used for center spiral passes for kicks, the center being 
10 to 12 yards away from the window. The top windows just below 


ticular game activity, 


192 The Status of Measurement in Physical Education 


the cross-bar are used for testing short forward passes of 10 to 20 
yards beyond the scrimmage line (that is, beyond the window). The 
passer stands from 7 to 10 yards in back of the line receiving the pass 
through the middle window. Space above the cross-bar is used for 
scoring long forward passes, punting and receiving, drop kicking and 
receiving. The general idea of the committee would be to place 
several goals of this nature in vacant spaces in order that practice 
and instruction may be given to a very large group. As many as 
twenty-four men can work around such a goal as this and be kept 
quite busy with passing and catching practice according to a definite 
schedule. 

Specific Experimental Contributions. Most of the tests 
brought forward prior to 1924 included certain fundamental sport 
technique tests but little effort was made to construct a battery of 
tests which would measure (or attempt to measure) all or many of 
the fundamental skills involved in a particular game. Brace did 
pioneer work in this field and in 1924 reported a battery of six 
achievement tests in basketball skills including: (1) shooting bas- 
kets, (2) dribble and shoot, (3) single overhand throw at a target, 
(4) push pass at a target, (5) speed pass, and (6) jump and reach. ! 
These were later revised slightly and republished with his scale of 
motor ability tests.? In this volume also are included achievement 
tests in indoor baseball and soccer. All of those batteries of tests 
were scaled according to the T-scale technique. Bliss’ work in 1927, 
though primarily a study of progression in skills of boys and girls of 
junior high school age, included four sport technique tests in basket- 
ball and baseball. 3 

Perhaps the first piece of experimental work, made with the idea 
of setting up a battery of tests for measuring the essential physical 
and mental qualities of playing ability in a sport, was that done by 
Elizabeth Beall in tennis. Some of the detail in connection with 
this study will be reported later in this chapter. The more recent 
Brace, David K., “Testing Basketball Technique,’ 4m. Phys. Educ. Rev., 

XXIX (April, 1924), 159-165. 
2Brace, David K., Measuring Motor Ability, pp. 74-77. New York, A. S. Barnes 

and Company, 1927. i 
3Bliss, James G., “A Study of Progression Based on Age, Sex and Individual 
Differences in Strength and Skill,” 4m. Phys. Educ. Rev., XXXII (January and 
February, 1927), 11-21, 85-99. 
4Beall, Elizabeth, Essential Qualities in Certain Aspects of Physical Education 


with Ways of Measuring and Developing Same. Unpublished master’s thesis, 
University of California, 1925. 


Sport Technique Tests 193 


contributions in sport technique and achievement tests are reported 
in alphabetical order according to sports. 


Archery 


Hyde's Archery Scales. Three experimental studies by Edith 
I. Hyde, University of California at Los Angeles, have given to the 
profession standards of achievement in archery for college women. 5 
The results of the first study are based upon data collected only at 
the University of California at Los Angeles and present tentative 
norms of achievement for beginning archers (women) at 20-, 50-, 
40- and 50-yard ranges. Definite practice effects at each distance 
were noted as well as a number of other important points in the 
study. It was found that when the Columbia Round is used as a 
measure of achievement in archery, ‘е scores at 40 yards are the 
best measure of the ability of beginners, and the scores made at 50 
yards the best measure of the ability of advanced archers.” The 
relationships between height and total score and weight and total 
Score were found to be too small to be significant. This fact is of 
importance when consideration is to be given to the construction of 
achievement scales. 

The national study of achievement in archery, sponsored by the 
National Association, Directors of Physical Education for Women 
in Colleges and Universities, made possible the establishment of 
national norms from a representative sampling of scores made by 
Women in colleges and universities throughout the United States. 
The achievement scales constructed from the data thus collected 
enable teachers and students to evaluate achievement in archery 
and compare this achievement with that made in other sports. 

Camp.Archery Association. The Camp Archery Association? 
has set standards to stimulate interest in the sport of archery at 
Summer camps. Five ranks are designated, Yeoman, Junior Bow- 


"Hyde, Edith L, "The Measurement of Achievement in Archery," Jr. of Educ. 
Кел, XXVII (May, 1934), 675-686. 
Hyde, Edith I., “National Research Study in Archery," Res. Quart. Ат. Phys. 


Educ. Assoc., Vol. VII, No. 4 (December, 1956), 64—75. 
Hyde, Edith I., “An Achievement Scale in Archery,” Res. Quart. dm. Phys. 


Educ. Assoc., Vol. VIII, No. 2 (May, 1937), 109-116. 
Hyde, Edith I., "The Measurement of Achievement in Archery,” Jr. of Educ. 
Res., XXVII (May, 1954), 686. 

New York. 


152 East 22nd Street, New York City, 


Camp Archery Associati 
о “Seibel ” Jr. of Health and Phys. Educ., 


Reichart Natalie, “School Archery Standards, 
Vol. 14, No. 2 (February, 1943), 81. 


194 The Status of Measurement in Physical Education 


man, Bowman, Archer and Silver Bow Archer. Standards are set 
for thirty arrows at distances of from 15 to 40 yards. Awards are 
available for achievement of the standards. 


Badminton 


Scott? suggests two simple tests as measures of badminton playing 
ability. The first test is one which measures skill in serving. It 
consists of serving twenty birds into a target marked in the opposite 
service court. A rope is stretched 20 inches above and parallel to 
the net. The second test measures ability to return a serve, and is 
again scored in relation to a target marked on the opposite court. 
A validity coefficient of .85 is reported for these two tests when a 
criterion of combined subjective ratings of playing ability and 
standings in tournament play was used. Reliability coefficients 
range from .77 to .98 for twenty trials on each test, but an examina- 
tion of scores reveals the possibility of limiting the test to ten trials 
each as a time saving device, except when using the serving test 
with beginners. A written objective examination accompanies 
this skill test and a grading plan for both is given. 


Baseball 


Playground Baseball. Rodgers and Heath? made an excellent 
beginning in the development of batteries of tests to measure 
ability in the technique of various team sports. A representative 
sampling of game techniques in playground baseball chosen by 
twenty-three teachers of physical education resulted in the selection 
of five particular skills. This test battery was given to 700 fifth 
and sixth grade boys. Scale scores were set up by means of McCall's 
T-scale technique in order to compare performances in different 
events and for the purpose of obtaining a composite score. 

The criteria used in judging the validity of the composite score 
consist of the judgment ratings of teachers and squad leaders and 
the success of individuals in making school teams. The correlation 


8Scott, M. Gladys, "Achievement Examinations in Badminton," Res. Quart. Ат. 
Assoc. for Health, Phys. Educ., and Rec., Vol. 12, No. 2 (May, 1941), 242-253., 
See also: Scott, M. Gladys and French, Esther, Better Teaching Through Testing 
pp. 50-56. New York, A. S. Barnes and Company, 1945. 

9 Rodgers, Elizabeth С. and Heath, Marjorie L., “An Experiment in the Use of 
Knowledge and Skill Tests in Playground Baseball,” Res. Quart. Ат. Phys. 
Educ. Assoc., Vol. II, No. 4 (December, 1931), 113-131. 


Sport Technique Tests 195 


coefficient between the composite score and the first criterion proven 
to be r=.63 for fifth grade boys and .65 for sixth grade boys. This 
order of correlation coefficient is not significant but is sometimes 
considered satisfactory for the type of data in question. The battery 
reliability coefficient (R=.83) seems fairly satisfactory for group 
measurement. The reliability of the knowledge test, when the 
statements were broken into chance halves, was found to be .89. 
А The authors of this experiment conclude that reliable criteria for 
judging success in various physical abilities are badly needed but 
feel that test reliability and validity would have been higher in a 
controlled situation. 

. Additional tests in Playground Baseball will be found in the 
indicated publication. ! 


Basketball 


A Test of Ability and Progress in Basketball (Men). Edgren's 
experiment in testing ability and progress in basketball suggests the 
possibility of formulating a valid battery of tests for predicting 
Potential playing ability.!! Though further experimental work 
along this line seems necessary, there are definite trends in the 
relationships existing between the sum of scores on a battery ot 
fundamental tests in basketball and playing ability as rated by 
Student coaches and scorers. In the final test this relationship 
reaches .77. The experiment further indicates that progress in the 
fundamental skills of basketball can be measured and that the most 
Satisfactory battery for measuring progress includes tests in speed 
Passing, passing for accuracy, speed dribble, dribble and shoot, and 
ball handling. It is unfortunate that no reliability coefficients for 
the various tests are listed. 

Evaluating Abilities in Basketball Players (Men). To aid 
the coach in selecting his squad and to motivate players in all-around 
development, Money !? has set up a battery of basketball tests, each 
Cozens, Frederick W., Cubberley, Hazel J. and Neilson, N. P., Achievement 

cales in Physical Education Aclivilies for Secondary School Girls and College 

Women. New York, A. S. Barnes and Company, 1937. 

! Edgren, H. D., “An Experiment in the Testing of Ability and Progress in 

шылу Res. Quart. Am. Phys. Educ. Assoc., Vol. III, No. 1 (March, 1932), 
"Money, C. V., "Tests for Evalua 

Athletic Journal, Vol. XIV, No. 5 (November, 

ber, 1933), 18-19. 


ting the Abilities of Basketball Players," 
1933), 32-34; and No. 4 (Decem- 


196 The Status of Measurement in Physical Education 


separate item of which can be used as a fundamental drill as well as 
a measure of ability. No attempt has been made to validate the 
battery but it would appear that a representative sampling of funda- 
mental skills in basketball is included in the list of individual tests. 
The list embraces physical efficiency, speed, and coordination, 
accuracy in passing, accuracy in shooting, dribble and shoot, pivot 
and shoot, and competitive shooting. The individual tests are well 
conceived but should be scored in such a way as to produce the same 
score in each for equivalent performance levels. A scale score of the 
type described in Chapter XVI (equal-step interval plan) would fit 
this situation admirably. 

Basketball Progress Tests (Boys and Men).!? This battery 
of tests was devised as a check-up on some of the game fundamentals 
in basketball. Here, again, no attempt was made to validate the 
battery as a measure of playing ability but the tests are sufficiently 
objective to note progress and improvement through the season and 
include passing for accuracy, pivoting for efficiency and form, 
dribbling for speed and control, and shooting for accuracy. The’ 
scoring would be improved very materially by the application of 
available statistical techniques. 

The Measurement of Ability in Women’s Basketball. In 
the Young-Moser!* study the game was first analyzed into its con- 
stituent skill elements and a large group of individual tests (36) 
assembled for the preliminary try-out. Out of this number, twelve 
tests were retained for further experimentation and the sum of the 
scores on these twelve tests chosen as an objective criterion of play- 
ing ability. In selecting a short battery of tests to measure playing 
ability, the validity of the short battery was established by correlat- 
ing composite scores of the tests with a rating by expert judges of the 
player's ability in a game situation. The resulting correlation 
coefficient (r=.859) certainly indicates a fairly desirable degree ot 
validity. The final battery of five tests includes a bounce and shoot, 
a speed pass, a free jump, Edgren’s ball handling test and a moving 
target test. The correlation coefficients between each of these tests 
and the criterion are fairly substantial while the intercorrelations of 


13Friermood, H. T., “Basketball Progress Tests Adaptable to Class Use," Jr. 
Health and Phys. Educ., Vol. V, No. 1 (January, 1934), 45-47. 

14Young, Genevieve and Moser, Helen, “A Short Battery of Tests to Measure 
Playing Ability in Women’s Basketball,” Res. Quart. Am. Phys. Educ. Assoc., 
Vol. V, No. 2 (May, 1934), 5-23. 


Sport Technique Tests 197 


the subtests are rather low. This battery of tests is highly recom- 
mended and should receive widespread use in athletics for women. 
T-scales have been constructed for each of the tests in the battery 
and test procedures are outlined in this study as well as a subsequent 
article.!? Scale scores, computed on an equal-step interval plan are 
shown by permission of the test authors in the reference listed. "® 
Also in this book will be found additional basketball tests. Based 
upon the same principles as the Cozens, Cubberley, Neilson !7 
scales are those by Russell and Lange!§ for the two additional 
events, basketball throw and catch, and basketball dribble for 


distance. 

A Basketball Test for College Women. Ina subsequent study, 
Glassow et al.,!9 report that the Young-Moser?? battery would be 
as valuable if the Edgren Ball Handling Test were omitted, and 
that it would be well to substitute the jump and reach test for the 
free jump when the test is to be used for grading purposes, since the 
free jump gives undue advantage to height. However, since height 
is an absolute value in basketball, and a low correlation was found 
between the jump and reach test and basketball playing ability, 
there is considerable argument in favor of the free jump item. These 
authors suggest a battery consisting of the bounce and shoot, zone 
toss, and wall speed pass. A validity coefficient of .66 was found 
when a criterion of student rankings of playing ability was used. 

A Basketball Motor Ability Test. The items in this Basket- 
ball Motor Ability Test were selected after a careful analysis of 
items used in other available basketball skill tests, and the compo- 
nent skills of the game itself. The final battery includes a moving 


15Moser, Helen A., “The Use of Basketball Skill Tests for Girls and Women," Jr. 
55-55. 


Health and Phys. Educ., Vol. VI, No. 5 (March, 1955), ; 
9 Cozens, Frederick W., Cubberley, Hazel J- and Neilson, N. P., Achievement 

Scales in Physical Education Aclivilies for Secondary School Girls and College 

Women. New York, A. S. Barnes and Company, 1957. 


17 Ibid, 
18Russell, Naomi and Lange, Elizabeth, “Studies Relating to Achievement Scales 


in Physical Education Activities,” Res. Quart. Am. Assoc. jor Health, Phys- 
Educ., and Rec., Vol. IX, No. 4 (December, 1938), 43-56. . а, е i 
19Glassow, Ruth B., Colvin, Valerie and Schwarz, сце н 
Measuring Basketball Playing Ability of College Women," Res. Quart. ame 
Assoc. for Health, Phys. Educ., and Rec., Vol.IX, No. 4 (December, 1938), 60-68. 


20 Young, С i d Moser, Helen, ор. cit. 

E "Dyer Joanna T., Schurig, Jennie C. and Apgar, Sara L, “A Basketball мыо 
Ability Test for College Women and Secondary School Girls, Rer. TET 
Assoc. for Health, Phys. Educ., and Rec., Vol. X, No. 5 (October, 1939), 12 


198 The Status of Measurement in Physical Education 


target test, Edgren’s Ball Handling Test, bounce and shoot test, and 
free jump and reach test. T-scores have been constructed for each 
sub-test in the battery. The critical ratios in comparing a known 
superior to a known inferior group were significant. Validity co- 
efficients for the sub-tests range from .76 to .91 as compared to 
criteria of judgment ratings. 

Achievement Tests in Girls’ Basketball.?? This battery ot 
tests was designed for purposes of motivation as well as for measur- 
ing the results of teaching. While it cannot be considered strictly as а 
sport technique test of the type of the Young-Moser Basketball Test, 
an attempt was made to partially validate the test items by expert 
opinion. The five tests in the skill battery are: (1) bounce over a 
six-foot area, (2) pass and catch, (3) jump and reach, (4) throw for 
goal, and (5) pivot, bounce and shoot. From the performance 
records of 1000 girls achievement scales were set up and are pre- 
sented in the study. "The knowledge test includes 50 true-false 
questions, 15 completion questions, 20 best answer questions and 
15 pictorial questions. The knowledge test is scaled with the skill 
tests. 


Field Hockey 


Field Hockey Achievement Tests for College Women. The 
study of Schmithals and French?? aimed to develop objective tests 
of field hockey which could be used to determine the ability status 
of students as a guide to teaching emphases, and to grouping stu- 
dents homogeneously for instruction. A variety of test items was 
studied. The authors concluded that the dribble, dodge, circular 
tackle and drive test is the best single item to use for classification 
purposes (validity, r=.44 with a subjective rating criterion; reli- 
ability, r=.92), and when used in combination with a knowledge 
test should serve for adequate early season classification. The best 
combination of two skill factors proved to be a goal shooting left 
test, and a fielding and drive test. För these the intercorrelation is 


22Schwartz, Helen, "Knowledge and Achievement Tests in Girls’ Basketball on 
the Senior High School Level," Res. Quart. Am. Phys. Educ. Assoc., Vol. VIII, 
No. 1 (March, 1937), 143-156. 

23Schmithals, Margaret and French, Esther, "Achievement Tests in Field Hockey 
for College Women,” Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., 
Vol. ХІ, No. 3 (October, 1940), 84-92. See also: Scott, M. Gladys and French, 


Esther, op. cit., рр. 59-63. 


Sport Technique Tests : 199 


r=.22, showing they measure different factors. The validity co- 
efficient is .60. 

Field Hockey Achievement Scales. The following achieve- 
ment scales in field hockey are available:?* dribble 25 yards, dribble 
and push pass, obstacle dribble, penalty corner hit for beginners, 
corner hit (intermediate test), and corner hit (advanced test). 


Gymnastics 

As a basis for predicting potential ability in gymnastics 
Wettstone?5 proposes the following three item battery: strength 
test consisting of chinning and dipping, the Burpee test, and a 
measure of thigh circumference divided by height. A gymnastic 
rating was used as the criterion score, and a validity coefficient of 
-79 was found when the three items were equated by means of a re- 
gression equation. This test is recommended for encouraging boys 
(likely to succeed) to participate in gymnastics and tumbling 
activities. Data were collected on college men students. 


Football 

Achievement of College Men in Touch Football.*° The 
Primary purpose of this study was to set up a short battery of tests 
to measure ability to play the game of touch football. A judgment 
criterion was established and an objective criterion, when compared 
with the subjective criterion, showed a substantial degree of validity 
(r—.851). The battery of five tests (used to measure ability to play 
the game) which had the highest multiple correlation with the 
objective criterion (R — .925) consists of: (1) forward pass for 


distance, (2) catching forward pass, (5) punting for distance, (4) 


running 50 yards carrying the ball, (5) pass defense — zone. The 
and economical and the 


administration of the battery is simple г 
battery can be used as a means of classifying students for purposes 
of instruction and competition as well as aiding the instructor in 
giving the student his course mark or semester grade. 
DC c W., Cubberley, Hazel J. and Neilson, N. P., op. cil. 
mea ween, IRE for Predicting Potential Ability in Gymnastics and 
Tumbling,” Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. IX, 
No. 4 938), 115-127. ў 
М ра ee Study of the Achievement of College ме шо 
J а 8 кА H c б а "n ia 3 E als 
Football. Unpublished master 5 thesis, University of aA T ale 


Report of Sub ittee IV, “Report of the Committee on : 
ЫК, I Education Association,” Res. Quart. Am. Phys. Educ. 


Assoc., Vol. ҰШ, No. 2 (May, 1957), 73-78. 


200 The Status of Measurement in Physical Education 


Achievement of College Men in Varsity Football. Brace" 
proposes a football achievement test designed for the purpose of 
pre-selecting those players most likely to succeed in varsity football 
and most worthy of warranting continued attention from coaches. 
The test items include forward pass at a target, 50-yard dash carry- 
ing football, forward passing for distance, blocking, punting, dodge 
and run, and charging. Validity criteria of players and coaches 
ratings of varsity squad members were used. Coefficients were 
significant but not high, which may be due in part to the fact that 
the ratings were made on the basis of Spring practice. 


Ice Hockey 


Skill tests in ice hockey have been outlined by Brown?! and 
include: (1) dribbling and dodging, (2) goal shooting, and (3) speed 
skating and dribbling. No attempt has been made to arrange 
scoring tables so that equivalent performances may be represented 
by identical scores. 


Rhythm Tests 


Rhythmic Capacity of Physical Education Majors. From 
a study of the Seashore Rhythm Test?’ with students majoring 
in physical education, Annett?? concludes that the test is fairly 
satisfactory for use in predicting skill in motor rhythm. In view 
of the comparatively low relationships between the test and a cri- 
terion rating scale set up by expert judgment (r—.47), it does not 
appear that his conclusion is justified. However, the fact that the 
group is highly selected (majors in physical education) must be 
given consideration, since it is practically impossible to obtain a high 
correlation in a group of this type. 


27Brace, D. K., “Validity of Football Achievement Tests as Measures of Motor 
Learning and as a Partial Basis for the Selection of Players,” Res. Quart. Ат. 
Assoc. for Health, Phys. Educ., and Rec., Vol. 14, No. 4 (December, 1943), 
372-377. a 

28Brown, Harriet M., “The Game of Ice Hockey,” Jr. of Health and Phys. Educ., 
Vol. VI, No. 1 (January, 1935), 28-30, 54-55. 

29This test has since been revised. See: Saetveit, Joseph G., Lewis, Don and 
Seashore, Carl E., “Revision of the Seashore Measures of Musical Talents,” 
University of Towa Studies, New Series No. 388 (October, 1940), Iowa City; 
Iowa, The University of Iowa Press. 

30Annett, Thomas, “A Study of Rhythmic Capacity and Performance in Motor 
Rhythm in Physical Education Majors,” Res. Quart. Am. Phys. Educ. Assoc., 


Vol. III, No. 2 (May, 1932), 183-191. 


Sport Technique Tests 201 


шрны ныл Тун ec ка se 

eses?! indicate that a practical 
rhythm test can be set up which will be more useful for physical 
education purposes than tests of the type of the Seashore Rhythm 
Tests. This practical rhythm test includes large bodily move- 
ments as well as a number of elements of rhythm such as tempo, 
accent, stress and intensity. In the first part of the test, the 
subject is required to express four different rhythm patterns while 
un the second part three specific tempos must be stepped in a given 
time (ten seconds). The test makers feel that the results of their 
experimentation are sufficiently significant to warrant further 
study. 

Measurement of Motor Response in Rhythm.?? Two 
specific techniques were developed for measuring rhythmic motor 
response. In the first, the subject was required to perform rhythmic 
patterns on a specially constructed platform while standing upon 
treadles. While performing these rhythmic patterns the subject 
holds cords in her hands which are drawn through the base of the 
platform and by their movements cause marks to be made on а 
revolving drum. Cords are also attached to the feet and records 
made on the drum in the same way. The drum record shows the 
time in seconds taken by the subject to perform the rhythmic pattern 
and when complete can be compared with a criterion and objectively 
scored, ч 

The second technique requires the subject to execute a rhythmic 
pattern on the floor over а pathway sprinkled with flour. The 
tempo of the pattern is set by a Victrola record and, when the sub- 
ject has completed the pattern, measurements of the mean ratio 
between strides (distances between imprints of the great toe) are 
computed. ‘‘If the cycle of movement is rhythmic, the ratio between 
parts of the cycle should be uniform and a low average deviation 
would indicate excellent rhythmic coordination.” $4 
ү "Though these tests аге not practical for large groups, because of the 
time involved, they may be used as a basis for guidance, particularly 
“A Study of the Relationships of Certain 
Ability in Girls and Women," Suppl. 
V, No. 1 (Marcb, 1934), 82-85. 


t of Success in the Teach- 
‘Am. Phys. Educ. 27206. 


31Lemon, Eloise and Sherbon, Elizabeth, 
Measures of Rhythmic Ability and Motor 
E Res. Quart. Ат. Phys. Educ. Assoc., Vol. 
Shambaugh, Mary Effie, “The Objective Measuremen: 
ing of Folk Dancing to University Women," Res. Quart. 
4 Vol. VI, No. 1 (March, 1955), 55-58. 
Shambaugh, Магу Effie, ор. сі, P 51. 


202 The Status of Measurement in Physical Education 


with major groups. Scores in both tests show a relatively high 
correlation (r=.91), though only a small number of students were 
used in the experimental group. 

Response to Auditory Rhythms.*+ In this study two types 
of tests were developed, a written test and a tapping test. “The 
written test consists of eighteen patterns, based on underlying beats 
of two, three, or four, . . . beaten оп a wooden temple block with 
a wooden baton to obtain clearness and similarity of sound . . . 
For the tapping test, one-half of the written test was used," based 
on the same eighteen rhythmic patterns. The experimenter con- 
cludes that: 


the written test is the more difficult and involves more factors than 
a test calling for the tap response. It could be used as a measure in 
the training of rhythmic perception. A person taking the written 
test needs to be trained in a scheme of graphic representation of 
her perception, as the written test is perhaps a measure of the 
individual's conscious perception of rhythm and is therefore a 
measure of her knowledge of rhythm in addition to her ability 
to perceive a pattern. It is possible that the tapping test reveals 
a person's ability for response to an auditory rhythm, and therefore 
might be used as a means of selection of those who have or have 
not such ability. 


Predictive Measures of Ability to Learn Dance Movements. 
Benton?? undertook a study to determine ability to perform move- 
ment techniques in modern dance. She found that this ability could 
be adequately measured with tests already available, since dance 
movement is not entirely rhythm, and other elements such as agility, 
motor educability, strength and balance are important. The vali- 
dating criterion was a combined judgment rating. Coefficients of 
correlation of from .77 to .93 were found for five regression equations 
which variously combine scores from a test composed of Johnson 
Type stunts, a test composed of Brace Type Stunts, the Static 
Balance Test, the РЕТ, the Seashore ‘Series A Rhythm Test and a 
Motor Rhythm Test. 


34 Buck, Nadine, “A Comparison of Two Methods of Testing Response to Auditory 
Rhythms,” Res. Quart. dm. Phys. Educ. Assoc., Vol. VII, No.3 (October, 1936), 
36-46. 

35Benton, Rachel Jane, “The Measurement of Capacities for Learning Dance 
Movement Techniques,” Res. Quart. dm. Assoc. for Health, Phys. Educ., ana 
Rec., Vol. 15, No. 2 (May, 1944), 137-144, 


Sport Technique Tests 203 


Soccer 


Soccer Skill Tests. (1)—For the purpose of stimulating stu- 
dents to improve their skill and to show progress made during a 
season, Vanderhoof?9 has devised a battery of ten soccer skill tests. 
No attempt has been made to validate the battery but the tests 
represent important elements in the game and include: (1) dribble, 
(2) trapping, (3) throw-in, (4) place kick for accuracy, (5) punt for 
distance, (6) volleying (by using forehead, shoulder, hip and knee), 
(7) throw-down (securing ball from opponent in a 6-yard circle), 
(8) tackling, (9) corner kick, (10) goal keeper’s test (skill in pre- 
venting goals). Scoring has been made objective but is set up 
empirically with a possible 10 points for each test. By validating 
it and using a scoring scheme worked out by correct statistical pro- 
cedures, this battery of tests shows possibilities for use in measuring 
ability to play the game. 

С Soccer Skill Tests. (2)—An experimental study quite sim- 
ilar to that in playground baseball has been made by Heath and 
Rodgers.7 From an analysis of skills in soccer by a group of twenty- 
five physical education teachers four representative samplings of 
worthwhile techniques in the game of soccer were selected — dribble, 
throw-in, place kick for goal and kicking a rolling ball. The tests 
were given to a little over 2500 fifth and sixth grade boys and were 
scaled by McCall's T-scale. The criterion of success in soccer playing 
ability was established by means of a judgment criterion. A further 
criterion was used, namely, success in making school and intramural 


teams. The correlation coefficients between judgment ratings and 


Composite score were г=.602 for fifth grade boys and r=.624 for 
oefficient for the test battery 


Sixth grade boys, while the reliability c 

reached .801 in the sixth grade. Though no criterion of validity was 
Set up for the soccer knowledge test, it was regarded as valid because 
of the selection of material for the test. The reliability coefficient 


of the knowledge test by means of Spearman's formula for chance 


halves was r=.903. Practically no relationship was found between 


the Composite score on the skill tests and the score on the knowledge 
test. The authors feel that the battery of skill tests may be used as 


39 Vandechoof, Mildred, “Soccer Skill Tests,” Jr. Health and Phys. Educ., Vol. ш, 


No. 8 (October, 1932), 42, 54-56. 

зне Marjorie "ipm, Rodgers, Elizabeth G., “А Study in the Use of Knowl- 
edge and Skill Tests in Soccer," Res. Quart. Am. Phys. Educ. Assoc, Vol. ТЇЇ, 
No. 4 (December, 1932), 33-55. 


204 The Status of Measurement in Physical Education 


a valid instrument for group measurement. It should be pointed 
out, however, that a much higher validity coefficient must be secured 
in order to rate individuals according to their ability in soccer by 
means of the test battery. A lengthened test battery, improvement 
in the reliability of individual test items, anda change in the manner 
of submitting judgment ratings might all combine to secure a higher 
validity coefficient. 

Tables for converting raw scores into T-scores in the skill and 
knowledge tests are presented. 

Soccer Skill Test for the Fifth and Sixth Grade. Bontz38 
developed a soccer test for use with fifth and sixth grade children 
which includes a straight away dribble combined with a side pass 
to wall and recovery, followed by a continued dribble and kick for 
goal. This test produced a reliability coefficient of -96, and a val- 
idity coefficient of .92 with a subjective rating criterion. The test 
is simple to administer and score and is of interest to children of 
this age. 

Soccer Skill Tests for Ninth and Tenth Grade Girls. Among 
tests proposed by Schaufele?? to measure ability in soccer are three 
which include a repeated volley test, in which the soccer ball is 
kicked successively against a wall; a passing and receiving test which 
involves dribbling the ball, passing it against a wall, controlling it on 
the rebound and repeating; and a judgment in passing test which 
requires a dribble and kick for goal. Coefficients of validity with a 
subjective rating of player ability as a criterion were .57, .50 and .34 
respectively for the three tests, but were 77, .72 and .82 when a 
total battery criterion was used. Reliability coefficients were .67, 
-72 and .69. These tests serve as excellent practice devices, because 
all involve techniques which closely approximate game situations. 

Achievement Scales in Soccer and Speedball.:? Неге аге 
offered statistically correct scoring scales for a battery of seven tests 
representing elements of the game. No attempt has been made to 
validate the battery as a measure of ability to play the game. 
38Bontz, Jean, dn Experiment in the Construction of a Test for Measuring Ability 

in Some of the Fundamental Skills Used by Fifth and Sixth Grade Children in 

Soccer. Unpublished master’s thesis, University of Iowa, 1942. See also: Scott 

M. Gladys and French, Esther, op. cit., pp. 79-82. 

39Schaufele, Evelyn F., The Establishment of Objective Tests for Girls of the Ninth 
and Tenth Grades to Determine Soccer Ability. Unpublished master’s thesis, 
University of Iowa, 1940. See also: Scott, M. Gladys and French, Esther, op. 


cit., pp. 71-79. i 4 
40Cozens, Frederick W., Cubberley, Hazel J. and Neilson, N. P., op. cit. 


Sport Technique Tests 205 
Speedball - 


Buchanan?! developed a series of speedball tests which measure 
the skills of lift to others, throwing and catching, kick-ups, dribbling, 
and passing. Validity coefficients for the battery range from .57 to 
88 with a validating criterion of combined teacher ratings of playing 
ability. Reliability coefficients for all tests were significantly high. 
T-scales for each test were computed, and a short form including 
the throwing and catching test plus three times the passing test 
suggested for use, if time does not permit all tests to be completed. 
Swimming 

A review of the literature relating to tests in swimming and life 
saving reveals a considerable body of material which may or may 
not have meaning for the teacher of physical education. Despite the 
fact that a number of investigators?? use various schemes to classify 
Swimmers and give them semester grades, few investigations have 
attempted to set up scales by which the value of a swimming per- 
formance may be judged. Numerous swimming tests, measured in 
objective units, have been offered in which standards have been set 
for advancement from the beginners to the intermediates, from the 
intermediates to the advanced group, etc., but more attention is 
necessary in the development of scientific measuring instruments 
from the performances of large numbers of individuals. 

A rather complete set of tests for women has been outlined for use 
* Buchanan, Ruth E., 4 Study of Achievement Tests in Speedball for High School 

Girls. Unpublished master's thesis, University of Iowa, 1942. See also: Scott, 


Я M. Gladys and French, Esther, op. cit., 98-106. Д К ^ 
?Troemel, Ernestine A., “Swimming—On An Efficient Grading Basis,” dm. 


Phys, Educ. Rev, XXXIII (June, 1928), 414-418. f 1 
Wayman, Agnes R., Education T! through Physical Education. Philadelphia, Lea 


and Febiger (3rd ed.), 1954. 

Spindle E "Do You Grade or Guess?”, Jr. Health and Phys. Educ., Vol. 
II, No. 8 (October, 1951), 26-28, 48. ; mig 

Anderson, Charlotte, * Achievement Records in Swimming," Jr. Health and 
Phys. Educ., Vol. I, No. 5 (May, 1930), 40. eS ee 

Playground and Recreation Association of America, "Swimming Badge Tests 
for Boys and Girls," dm. Phys. Educ. Rev, XXXIV (Мау, 1929), 298-304. 


Smith, Ann Avery, "Aids to Efficient Swimming Instruction for Girls and 
Women.” Jv. Heallh and Phys. Educ., Vol. II, No. 7 (September, 1931), 32-95, 


45-46. 
Du Lyba and Nita, Swimming Simplified, Chapter X. New York, А. S. 
arnes and Company, 1927. Chapter II. New York, A. S. Barnes and 


Kiphuth, Robert J. Н., Swimming, 
Company, 1942, ' 


206 The Status of Measurement in Physical Education 


by Parkhurst*? at the University of Texas. This plan provides for 
achievement standards or goals in each of four groups, Beginners, 
Intermediates, Advanced and Life Saving. Both optional and re- 
quired goals, with point values for each, have been established and 
it would appear that the plan has possibilities for development when 
proper measuring techniques are put into operation. 

The student of swimming tests should be familiar with numerous 
tests set up by organizations sponsoring “water safety" throughout 
the country. Prominent among these tests are: 


American Red Cross Beginner's Test, Intermediate Swimmer 
Test, Swimmer Test, and Advanced Swimmer Test* 

American Red Cross Functional Swimming and Water Safety 
Test*5 

Swimming Merit Badge Test, Boy Scouts of America 46 

Swimming Badge Tests, National Recreation Association 47 

Swimmer's Test, National Young Men's Christian Association 

Swimmer's Test, Girl Scouts? 

Swimmer's Test, Camp Fire Girls 5° 

A. A. U. Physical Fitness Swimming Tests. 5! 


Cureton's Swimming Test for Beginners (Rotational 
Method).?? In attempting to devise new and better ways of teach- 


43Parkhurst, Mary Grant, "Achievement Tests in Swimming,” Jr. Health and 
Phys. Educ., Vol. V, No. 5 (May, 1934), 54—56, 58. 

44American National Red Cross, Instructor's Manual — Swimming and Diving 
Courses. Washington, D. C., The American Red Cross, 1938. 

45American National Red Cross, Instructor's Guide — Functional Swimming апа 
Water Safety Training Course. Washington, D. C., The American National Red 
Cross, 1945, p. 23. 

19Boy Scouts of America. Handbook for Boys. 5th Edition, pp. 522-523. New 
York, The Boy Scouts of America, 1948. 

M E the Swimming Badge Tests," Recreation, XXVII (July, 1933), 
183, 205. 

48Y.M.C.A., Beginning and Intermediate National YMCA Progressive Aquatics 
Tests. New York, Association Press, 1948. 

49The Girl Scout Handbook, p. 498. New York, National Headquarters, Girl 
Scouts, Inc., 1947. 

50Camp Fire Girl's Manual. New York, Camp Fire Outfitting Company, 1948. 

51Evans, Dorothy, “Swimming to Physical Fitness,” Jr. Health and Phys. Educ., 
Vol. 15, No. 4 (April, 1944), 193. 

52Cureton, Thomas K. Jr., How to Teach Swimming and Diving, Vol. I, Chapter 
IX. New York, Association Press, 1934. 
Cureton, Thomas K., Jr., Standards for Testing Beginning Swimming. New 
York, Association Press, 1939. Same digested: Res. Quart. dm. Assoc. for 
Health, Phys. Educ., and Rec., Vol. X, No. 4 (December, 1959), 54-59, 


| 


Sport Technique Tests 207 


ing swimming to beginners, Cureton has done a considerable amount 
of experimentation at Springfield College. Out of this experimental 
work has come the Rotational Method. This consists of guiding the 
beginner through a series of twenty-five progressive tests which in 
reality represent “ the essential steps required to learn how to swim." 
Says Cureton, 53 “The Rotational Method aims to keep the indi- 
vidual in sight at each stage by means of an accurate record of 
performance.” 

In Step No. 1 a placement test is given in order that students may 
be grouped as homogeneously as possible. A questionnaire regarding 
the individual’s swimming background is also given so that the in- 
structor may know the experience of the individuals in his classes. 

Step No. 2 consists of a demonstration of the twenty-five be- 
ginner’s skills and the performance of each student in each skill. The 
twenty-five skills progress gradually from easy to difficult and in- 
clude stunts designed to test confidence in the water, buoyancy, 
initial ability in diving and swimming. | 

Step No. 5 comprises an individual analysis of difficulties en- 
countered by each student. This analysis is made from a master 
chart and a study of the questionnaires, and instruction is organized 
on the basis of the diagnosis. 

The final step (No. 4) consists of drills designed to eliminate 
deficiencies and to allow swimmers to progress from group to group 
as their ability warrants. 

Such a scheme as the Rotational Method requires excellent organ- 
ization and a considerable amount of clerical work, but is in accord 
with the best teaching procedures and permits definite measure- 
ments of the results of teaching and continuous development of skills. 

Cureton's Intermediate and Advanced Теѕіѕ.5* Without 
question Cureton's work in the development of achievement scales 
for swimming is the outstanding research in this field in the United 
States. The principal characteristics of the intermediate tests are: 
(1)a progressive order of difficulty based on experimental evidence, 
(2) arrangement of tests so that difficulties can be diagnosed at a 
&lance, and (3) motivation by means of a percentile chart. 
sé imming and Diving, p. 160. 
бше, D PEU E is Te ва 2 Practise of I intermediate Aquatics, 

Pp. 89-130. Springfield, Mass., Springfield College, 1955 (mimeographed). 

ive Scales for Rating Swimming Performance ana 


Cureton, Thomas K., JF.» Object. A 
[йрй Fide D eld, Mass., Springfield College, 1935 (mimeo- 


graphed). 


208 The Status of Measurement in Physical Education 


The first test for intermediates consists of twenty-five items 
arranged in five groups on a ралу or fail basis. The second set of 
tests is entirely objective and is measured in terms of time, speed 
and distance or number of strokes or skills. These may all be rated 
on a percentile basis so that scores can be added or averaged to give 
total or composite ability. 

The tests for advanced swimmers are given particularly to analyze 
major needs and to discover special aptitudes, e. g., sprint or en- 
durance swimming, and are divided into seven groups. Each 
separate item of the forty-four may be measured in definite units 
and converted into an achievement score from percentile charts. 
Among the torty-four items are tests specially devised for particular 
phases of swimming, such as the endurance test briefly described in 
the next section. 

Test for Endurance in Speed Swimming. Cureton?? points 
out that "since endurance is such a dominant factor in success, the 
coach needs a test to determine the status of his men in this quality," 
and has devised a reliable endurance index which is useful for '' (1) 
motivation of the swimmer, (2) information about the swimmer as 
to the relative perfection of his speed and also his endurance con- 
dition, and (3) determining the amount of improvement which takes 
place during the season, or during any interval, in condition or in 
perfection of stroke." 

In computing the swimmer's rating for 100 yards in a 60-foot pool, 
obtain the fastest speed of the swimmer in the first lap (60 feet) and 
note the drop-off index (time of first lap from time of last lap). His 
time for 100 yards in perfect condition should then be: 


Time for 100 yards =5 (time 60-foot sprint) 4-2.25 (drop-off index) 
+5.5. 

A performance scale for securing a percentile rating has been 
computed by Cureton so that the swimmer may know where he 
stands both in speed and endurance. , 


Speed Swimming Scales for Secondary School Girls and 
College Women. 5? These are shown for free style, breast stroke 
and racing back stroke at 20, 50, 40, 50 and 60 yards. 


55Cureton, Thomas K., Jr., “А Test for Endurance in Speed Swimming," Suppl. 
to Res. Quart. Am. Phys, Educ. Assoc., Vol. VI, No. 2 (May, 1935), 106-112. 
56Cozens, Frederick W., Cubberley, Hazel J. and Neilson, N. P., op. cit. 


Sport T. echnique Tests 209 


Achievement Scales in Wartime Swimming. Hewitt" has 
made available achievement scales in events included in wartime 
swimming programs, where the emphasis was placed on endurance, 
life saving skills, and energy conserving strokes rather than upon 
speed swimming. The scales are based upon 3,000 performances 
by college men students in a 25-yard pool, and include 20-yard 
and 25-yard underwater swim in seconds; 15-minute endurance 
swim; and glide and relaxation ability: number of strokes used 
in 50 yards for elementary back stroke, side stroke and breast 
stroke. 

Naval Aviation Swimming Standards. The United States 
Navy Aviation Training Division developed comprehensive stand- 
ards of achievement for use in its wartime swimming program. 
The tests were based upon progressively increasing difficulty of 
skills, and were designed to measure improvement as the cadet 
moved from one school or base to the next. The tests began with 
simple “D” and “С” tests, which measure ability to swim, tread 
or float for at least five minutes, and ability to perform four ele- 
mentary strokes. The remainder of the tests, including “В,” “A,” 
“AA,” and “AAA” Tests, became increasingly more difficult, and 
embraced elements of endurance swimming. life saving, and rescue 
techniques. A Maintenance Check Test, to be administered every 
two months to officers, was designed to appraise the maintenance 
of endurance in the water. 

In-additon.a series of “check out" tests was used to evaluate 
Specifically each separate skill taught in the swimming program. 
"These tests embodied singly and in combination the skills deemed 
essential to all Navy fliers, and were used for grading or rating 
cadets, as well as to appraise the effectivenesss of instruction. З The 
tests were primarily concerned with life saving and rescue techniques 
such as clearing oneself from a submerged plane cockpit, launching 


and inflating a life raft, boarding a rubber boat, ascending and 


descending a cargo net, inflating clothing for use as a life buoy, 


and others, all arranged in logical sequence to parallel actual 

conditions. а 

"Hewitt, Jack E., “Achievement 
Quart. Am. Assoc. for Health, Phys- Educ., and 


i qe e Physical Training Manuals 
Е Ааа ysical Tram ex ve 
nee eel Ae та ЕДЕ Tests and Testing Procedures", рр. 20 


Annapolis, Maryland, United States Naval Institute, 1944. 


Scale Scores for Wartime Swimming," Res. 
i Rec., Vol. 14, No. 4 (December, 


Swimming, Chapter XIU, 
7-226. 


210 The Status of Measurement in Physical Education 


Tennis 


Essential Qualities Necessary for Tennis Players. Beall? 
first secured expert opinion on the qualities necessary to play excel- 
lent tennis. Because of the large number of these qualities (23 in 
all), it was decided to divide them into groups as follows: 


1. Organic condition. 

2. Muscular coordination. 

3. Accurate knowledge of strokes — forehand drive, backhand 
drive, overhead serve. 

4. Interest. 

5. Agility — speed of arms and legs. 

6. Oualities which make for success in any game such as aggres- 
siveness, concentration, good judgment, patience, perseverance, 
strategy, etc. No attempt was made to measure this group of 
qualities. 


On account of lack of measuring devices, specific tests were out- 
lined for the first five groups of qualities and given to 174 women 
students in tennis classes meeting twice per week at the University 
of California. Each student was tested at the beginning of the 
semester and again after a rather brief period of instruction and 
practice. 

A number of general conclusions in regard to the study may be 
pointed out:®° 


1. A test which would measure tennis coordination would have 
to be based on the specific elements used in the various strokes, and 
not on general coordination. 

2. The period of instruction was too brief to draw accurate con- 
clusions in regard to practice as a means of measuring interest. 

3. The quality of accurate knowledge of strokes, as shown by the 
tests for the forehand drive, backhand drive, and overhead serve is 
measurable by such types of tests as were used in this study. 

4. The success of the tests given in’ this study shows a possibility 
for devising at once, simple tests for measuring many of the qualities 
necessary for good tennis players. 


59Beall, Elizabeth, “Essential Qualities in Certain Aspects of Physical Education 
with Ways of Measuring and Developing the Same," Am. Phys. Educ. Rev., 
XXXIII (June, 1928), 390-397; (September, 1928), 454-463; (October, 1928,) 
516-520; (November, 1928), 582-585; (December, 1928), 646-649. 

60Beall, Elizabeth, op. cit., dm. Phys. Educ. Rev., XX XIII (December, 1928), 648. 


Sport Technique Tests 2n 


Tests to Determine Progress in Tennis. In connection with 
an article on tennis technique, Edgren?! offers a battery of five 
indoor tests which gives the teacher a method of measuring present 
ability and progress over a given period. The test battery includes 
the following game fundamentals: (1) serve for accuracy, (2) speed 
in stroking, (3) forehand stroke for accuracy, (4) backhand stroke 
for accuracy, and (5) volleying. Scoring methods are outlined but 
these should be changed to-equate similar performances in the 
various tests. No attempt has been made to validate the test 
battery as a measure of playing ability. 

The Dyer Backboard Test of Tennis Ability.9? This test 
"consists in rallying a tennis ball against a backboard, trying to 
Score as many hits as possible in the time limit of thirty seconds." 
The test seems to be very objective, and the range of reliability 
coefficients at various institutions (colleges) runs from .84 to .90. 
The criterion used to establish validity consisted of the judgment 
of experts. The test scores were correlated with judgments in five 
different sets of test data and with the most trustworthy data the 
correlation coefficients run between .85 and .90 thus supporting the 
author's claim of validity. It seems apparent that this test has great 
Promise as a measuring device in certain phases of tennis ability and 
Should be used in the classification of groups and in measuring 


Progress. 


for the test. This served to increase 
for the new method of scoring appear 
in the second article.93 The Cozens, Cubberley and Neilson scale, sa 
though constructed for the first method, was found to hold for the 
new method as well. Scales for other tennis tests also appear in 


this source. 
Grading Beginners i 
skill test and a written tes 


"oem. H. D., “Tennis Technique," 
ay, 1934), 50—51, 56. EEE 
yer, once “The Backboard Test of Tennis Ability, Suppl. to Res. 
Quart. Am. Phys. Educ. Assoc, Vol. VI, No. 1 (March, 1935); 62-74. ,, 
Dyer, Joanna Thayer, “Revision of the Backboard Test of Tennis Ability, 


Res. Quart. Am. Assoc. for Health, Phys- Educ., and Rec., Vol. IX, No. 1 (March, 


1938), 25-31. \ . 

“Co i J., Cubberley, Hazel J. and Neilson, N. P., ор. cit. iM 

95 Wagner: y ox» d “An Objective Method of Grading Beginners in Tennis, 
Jr. Health and Phys. Educ. Vol. VI, No. 5 (March, 1955), 24-25, 79. 


line to the original directions 
the validity to .92. T-scales 


n Tennis.9? This test includes both a 
t and the student is marked on a per- 
» Jr. Health and Phys. Educ., Vol. V, No. 5 


212 The Status of Measurement in Physical Education 


centage basis, the skill test counting two thirds and the written test 
one third. The elements in the skill test include: (1) forehand drive, 
(2) backhand drive, (3) forehand drive with footwork, (4) backhand 
drive with footwork, and (5) service. In all, the student has an 
opportunity to stroke 50 balls and the score is obtained by multi- 
plying the number of good trials by two. Experimentation shows 
that the results represent a normal distribution of ability. 

The written test includes rules, court position and tactics, and 
knowledge of good form in the strokes practiced. 

It would appear that this test as a whole has possibilities for 
validation and is a most useful device for marking the ability of 
beginning tennis players. 


Table Tennis 


To serve the three-fold purpose of motivation, measurement of 
achievement, and quick classification for tournament players, Mott 
and Lockhart have developed a table tennis backboard test sim- 
ilar in ideas to the Dyer Backboard Test of Tennis Playing Ability. 
A half table is propped perpendicular to a post or wall, with the 
adjoining half table adjacent, and horizontal to the floor. The score 
for the test is the number of repeated rallies in thirty seconds against ’ 
the backboard half of the table, without fault, a number of which 
are stipulated. T-scores derived from scores of college women are 
given. The reliability and validity coefficients are reported as .90 and 
.84 respectively, although the validating criterion is not indicated. 
The test proves economical of time and is easily scored. 


Track and Field 


By far the largest assortment of tests, scoring tables and achieve- 
ment scales are to be found under this heading. Some detail in 
connection with tests of this type has been given in Chapter V.97 


59Mott, Jane A. and Lockhart, Aileen, "Table Tennis Backboard Test," Jr. of 
Health and Phys. Educ., Vol. 17, No. 9 (November, 1946), 550-551. 

67The more recent tests in track and field events cited in connection with the . 
discussion in Chapter V include: 
1. McCloy's Scoring Tables. 
2. Achievement Scales for Boys and Girls in Elementary and Junior High 
Schools. 

. Achievement Scales for Boys in Secondary Schools. 

. Achievement Scales for College Men. 

. Achievement Scales for Secondary School Girls and College Women. 
National Standards of Achievement for Boys and Girls. 


Ov OY i а 


Sport Technique Tests 213 


Among the others deserving of special mention are the following: 


Scoring Tables for College Women.93 Besides setting up 
scoring tables in three events according to the T-scale technique, 
Mitchell shows very definitely that none of the factors of age, height 
or weight has any appreciable effect upon performance in these 
events and that therefore college women constitute a homogeneous 
group. As has been pointed out previously this lack of relationship 
eliminates the necessity for any age-height-weight grouping with 
college women. 

Percentile Scales for Boys.9? In this study percentile tables 
are set up according to age only in seven events including foul 
shooting, 50-yard dash, pull-up, push-up, running high jump, run- 
ning broad jump, shot-put (8 pounds) and shot-put (12 pounds). 
The tables are arranged in such a manner that, knowing the boy's 
age and performance, his percentile rank can be read directly from 
the table. The study has been carefully made from over 44,000 
records but would have been much more valuable if the factors of 
height and weight, as well as age, had been considered. Since both 
of these factors have significant correlations with performance, it 
is important to consider them in any scoring scheme involving the 
age range of the elementary and secondary schools. 

A Fall Decathlon for College Track Squads. For the purpose 
of stimulating all-around competition and evaluating scientifically 
the individual abilities of track squad members a fall decathlon 
complete with scoring charts for each event is proposed by Cozens. 0 


Since this decathlon is designed for fall competition distances are 


not standard, and the hop, step and jump is substituted for the pole 


vault. It is suggested that the events be administered over a three 
day period in the following order: first day, 75-yard dash, 12 Ib. 
shot-put, standing hop, step and jump, and 330-yard run; second 
day, 120-yard low hurdles (5 hurdles placed 20 yards apart), running 
high jump, and 660-yard run; third day, running broad jump, discus 
throw, and 1320-yard run. тһе scoring scheme based upon a 1000 


SMi iola, ^. «arable for College Women in the Fifty-Yard Dash, 
Me EU Scoring Td the Basketball Throw for Distance," Suppl. to 
Res. Quart. Ат. Phys. Educ. Assoc., Vol. V, No. 1 (March, 1934), 86-91. 3 

*9Conger, Ralph G., Percentile Scales of Seven Physical Achievement Tests. dor joys 
From Twelve to Nineteen: Published privately by the author at Central Hig! 

Dod Сари МЕ адв for Track Squads,” Res. Ошто am 
Assoc. for Health, Phys. Educ., and Rec., Vol. IX, No. 2 (May, 1938), 3-14. 


214 The Status of Measurement in Physical Education 


point maximum is scientifically constructed using the increased 
increment principle. Equivalent scoring tables for each event are 
shown. 


Volleyball 


Practice Tests in Volleyball Skills. For the purpose of 
developing skill in both individual and team play, Reynolds?! 
devised a series of four volleyball tests for use in the South Park 
Playgrounds, Chicago. As in the case of many other batteries, his 
tests are used for their practice effects in the elements of the game 
and include: (1) a service test, (2) a net return or set-up, (3) a return 
from back court, and (4) an overhead pass to the back court from 
the net. It is unfortunate that no standards of performance are 
offered with the test descriptions. 

Achievement Scales in Volleyball Skills. Achievement scales 
in four volleyball tests are offered in Achievement Scales in Physical 
Education Activities for Secondary School Girls and College Women.?? 

Achievement Tests in Volleyball for High School Girls.73 
After analyzing the skills used in volleyball for girls in game situ- 
ations, directions for administering the following tests were formu- 
lated: (1) repeated volleys, (2) serving, (3) set-up and pass, (4) 
recovery from the net. The criterion set up for judging the value ot 
the tests consisted of ratings made by four trained judges. These 
ratings as determined by correlating the sum of the ratings of two 
judges with the sum of the ratings of the other two, were found to be 
consistent (r=.881 and r=.914 for two groups). In determining 
the validity of the battery of four tests with two groups, the relation- 
ships were found to be R=.815 and R —.620. 'The authors feel 
that in all probability the difference in correlation is due to the fact 
that the second group was composed of girls who had very little 
experience in volleyball and who were tested in large groups. The: 
tests are recommended for use as teaching devices as well as for 
classifying students into groups of similar ability and will be found 
valuable for diagnosing weaknesses. 


71 Reynolds, Herbert J., "Volleyball Tests,” Jr. Health and Phys. Educ., Vol. I, 
No. 3 (March, 1950), 42-44. 

72Cozens, Frederick W., Cubberley, Hazel J. and Neilson, N. P., op. cil. 

73French, Esther L. and Cooper, Bernice I., “Achievement Tests in Volléyball For 
High School Girls,” Res. Quart. Am. Phys. Educ. Assoc., Vol. VIII, No. 2 (May, 
1937), 150-157. 


Sport Technique Tests 215 


Russell and Lange? + provide scoring scales for two items from the 
French and Cooper battery, namely, the serving test and the re- 
peated volleys test, though the latter is slightly modified. Russell 
and Lange found these two items sufficiently reliable, and a satis- 
factory battery to use with junior high school girls. 

University of Wisconsin Volleyball Skill Tests. Bassett, 
Glassow and Locke? conducted extensive studies on the validity 
of various types of volleyball skill tests. They suggest two tests: a 
Serving test to measure force of serve, placement of serve, and ability 
to’ get the ball across the net; and a volleying test designed to 
measure reaction timing, passing, receiving, and accuracy of place- 
ment. Composite ratings by three judges were used as a validating 
criterion. Coefficients of .79 for the serving test, and .51 for the 
volleying test are reported. Reliability coefficients of .84 and .89 
were found for the serving test and volleying tests respectively. 

Repeated Volleys Test. Brady 7% suggests that a volleying test 
Consisting of repeated volleys against a wall within given boundaries 
is a valid test of volleyball playing ability for college men. Reli- 
ability for this test is .925, and the validity coefficient is .86 when 


a combined judgment rating criterion was used. 


All-Around Athletic Performance 


Program for College Men and Women. For the purpose of 
stimulating greater interest and participation in a well-rounded 
athletic program, Clevett?? of George Williams College, has organ- 
ized batteries of objective tests (with the exception of gymnastics 
and diving) in a variety of sports for both men and women. No 
attempt was made to select tests which would measure all of the 
fundamentals or elements of а particular sport but a represen- 
tative sampling of the fundamentals has been chosen. The ten 


74Russell, Naomi and Lange, Elizabeth, “Achievement Tests in Volleyball for 

[шша renis Res. Quart: Ат. ‘Assoc. for Health, Phys. Educ., ana 
ec., Vol. X = mber, 1940), 55-41. 

75Bassett, ey Net ee к and Locke, Mabel, “Studies in Testing Volley- 
ball Skills,” Tas Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. VIIL 

‚Хо. 4 (December, 1957), 60-72. t 
Brady, George F., “Preliminary Investigation: 
Res. Quart. Am. Assoc. for Health, Phys. Educ., 

,,1949), 14-17. К E d 
Clevett, Melvin A., “Ап All-Around Athletic Championship, Jr. 
Phys. Ёдис„ Vol. VI, No. 3 (March, 1955), 48, 75-74. 


s of Volleyball Playing Ability,” 
and Rec., Vol. 16, No.1 (March, 


Health ana 


216 The Status of Measurement in Physical Education 


sports used to determine the college championship are shown in 


the accompanying table. 


Men 
I. Basketball 
Dribble 
Dribble and shoot 
Speed pass 
II. Volleyball 
Hard service 
Easy service 
Set-up 
III. Tennis 
Service 
Volley 
Driving 
IV. Golf 
Driving 
Approach shots 
Putting 
V. Baseball 
Fungo batting 
Fielding 
Accuracy throw 
VI. Gymnastics 
Parallel bars 
High horizontal bar 
Tumbling 
УП. Swimming 
40-yard free style 
Lifesaving 
Fancy diving 
VIII. Track and Field 
220-yard dash 
unning broad jump 
Shot-put 
IX. Football (men only) 
Punt for distance 
Pass for distance 
Pass for accuracy 
X, Handball (men only) 
Speed strokes 
Back wall return 
Service 


Women 
Free throw 
Dribble and shoot 
Speed pass ` 


Same as for men 

Same as for men 

Same as for men 

Same as for men except larger ball A 
Tumbling only 


20-yard free style 
Lifesaving 
Fancy diving 


50-yard dash 
Running broad jump 
Baseball throw for distance 


Sport Technique Tests 217 


The New York State Program. Stimulated by wartime fitness 
needs the Division of Health and Physical Education of the New 
York: State Education Department in cooperation with the Office 
of Physical Fitness of the New York State War Council78 developed 
tests and standards in a wide variety of physical education activities. 
Standards were set in the following six areas for boys and young men: 


. Gymnastics, Apparatus and Tumbling Activities. 
. Individual and Dual and Combative Sports. 

. Track and Field Activities. 

Swimming and Water Safety. 

Team Sports. 

. One Hundred Yard Obstacle Course. 


суол озде 


Evaluative procedures were developed in the following activities 


for girls and young women: 
Basketball, Field Hockey, Softball, Soccer, Speedball, Volley- 


1. Team Sports: 


ball. 
. Individual Sports: Archery, Badminton, Bicycling, Bowling, Croquet, Deck 


Tennis, Fencing, Handball, Golf, Horseshoes, Horseback Riding, Ice 
Skating, Paddle "Tennis, Roller Skating, Shuffleboard, Skiing, Snow Shoe- 
ing, Table Tennis, Tennis, and Toboganning. 

. Dance: Folk Dance, Modern Dance, Social Dance, Tap Dance. ; 

. Self Testing Activities: Stunts and Tumbling, Apparatus, and Emergency 


Skills. 
Aquatics: Swimming, Boati 
. Outing and Camping. 
. Fundamentals of Motor Performance. 


ы 


ng, Canoeing. 


мот pO 


ablished listing “Merit,” “Excellent,” 


An award system was est 
dards in each area. Those meeting 


and “Superior” achievement stan 
standards in all areas could earn an ALL-ROUND award. 


While work on the validation and perfection of these standards, 
tests and evaluative procedures was not completed, the program as 
developed was administratively sound, and served effectively to 
stimulate interest and effort in improving the physical education 
programs in the high schools of the State. 

Standards for Boys and Young Меп, “А Manual 


of Physical Fitness and 
f Health and Physical 


78 New York State Physical Fitness 1 
for Young Men.” New York State War Council, Office 
New York State Education Department, Division o 
Pico, Albany, New York, xd MORS NT T 

ew York State P. ical Fitness Slandaras, ‘Evaluative Procedures in Physica’ 
die ы ates ew York State War Council, Office 


Activities for Girls and Young Women." N , cil, ( 
of Physical Fitness and New York State Education Department, Division of 


Health and Physical Education, Albany, New York, 1944. 


218 The Status of Measurement in Physical Education 
Selected References 


The footnote references in this chapter are all excellent for tests 
of the types indicated. 


Cureton, Tuomas K., JR.: How fo Teach 
New York, Association Press, 1934. 


Cureton, Tuomas K., Jr.: The Leaching and Practice of Intermediate Aqualics, 
pp. 89-150. Springfield, Mass., Springfield College, 1935. 


These two references offer the best material to be found on technique tests in 
swimming. 


Swimming and Diving, Chapter IX. 


Dyer, Joanna THAYER: “Revision of the Backboard Test 
Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol 
1938), 25-31. 

A simple and valid test for classify 
is unknown. The test may also be u 
mended as a motivating device. 

EbcnEN, H.D.: “An Experiment in the Testing of Ability and Progress in Basket- 
ball" Res. Quart. Am. Phys. Educ. Assoc., Vol. III, No. 1 (March, 1932), 
159-171. 

Apparently the first attempt reported in th. 
of tests for measuring sport technique. 

Harenr, Еоітн C.: “Individual Differences i 


ces in Motor Adaptations to Rhythmic 
Stimuli,” Res. Quart. dm. Assoc. Jor Health, 


of Tennis Ability,” 
- IX, No. 1 (March, 


ing groups of tennis players whose ability 
sed to measure improvement and is recom- 


е literature to validate a battery 


Phys. Educ., and Rec., Vol. 15, 
No. 1 (March, 1944), 38-43. 
Describes an interesting. device for mechanically measuring response to 
rhythmic stimuli. 


CHAPTER XI 


Knowledge and Information Tests 


The teacher of physical education is concerned with imparting 
knowledge of and attitudes in activities as well as skills or techniques 
and therefore should have ways of determining how much informa- 
tion the student has acquired. Though very little had been done 


prior to 1932 in the field of physical education knowledge and intor- 


mation tests, the material presented since then indicates that this 
ll keep pace with the develop- 


phase of the measurement program wi 
ment which has taken place in activity tests. 

Standardized objective type knowledge tests serve many useful 
Purposes, and set high standards for careful construction. An 
increasing number of these tests are appearing in the literature of 
health and physical education. Health education has produced 
ests with national norms are yet 
available for physical education from any of the test construction 
agencies which supply standardized tests in other school subjects. 
Nevertheless, even when this picture changes, the need will con- 
tinually remain for informal teacher constructed objective type tests, 
Which are directed toward course objectives and other local condi- 
tions. Teachers should familiarize themselves with the best pro- 
cedure in the construction of this type test, and strive to have their 
own tests meet acceptable criteria. 

The construction of a knowledge or information test calls for the 
use of a somewhat different type of research technique than is 
ordinarily used in the construction of an activity test. Though 
texts! should be consulted for detailed procedure in connection with 


many more, however, and no f 


!Sce selected references at the end of this chapter. 


219 


220 The Status of Measurement in Physical Education ` 


the construction and use of objective examinations, a number of 
principles may be mentioned here. 2 


1. A critical survey of subject matter should be made. 

2. Directions for the test must be very explicit. 

3. It is important to sample extensively and carefully and include 
a fairly large number of items. The working time of the student 
and frequency of tests must be given consideration. 

4. Ambiguity must be avoided but answers must not be too 
evident. 


5. The range of difficulty should be such as to avoid either perfect 
or zero scores. 

6. There should be no regular sequence of answers, For example, 
alternate true and false response questions would undoubtedly 

` become quite apparent. 

7. When possible, items should be arranged according to difficulty, 

8. To avoid monotony various types of exercises should be used, 
‘completion, true-false, recall, multiple-choice, matching, each group 
placed together: 


In most of the knowledge and information tests cited here 


, гесор- 
nition has been given to these principles. 


(4) In Physical Education Activities 


Badminton, Scott's? written objective badmi 
Consists of forty-seven multiple-choice and thirt 
items. The battery was sci 
'evaluated for its discriminating index and diffi 
reliability coefficients computed by the Spearm 


? Adapted from Greene, Harry A., Jorgensen, Albert N., and Gerberich, J. Ray- 
mond, Measurement and Evaluation In the Secondary School, Chapter VIII. 
New York, Longmans, Green and Company, 1943. 

Scott, M. Gladys, “Achievement Examinations in Badminton," Res, Quart Ат. 
Assoc. for Health, Phys. Educ., and Rec., Vol. 12, No. 2 (May, 1941), 242-255 


Knowledge and Information Tests 221 


Phillips’ * badminton knowledge test is a scientifically constructed 
battery composed of forty-five true-false and fifty-five multiple- 
choice items covering rules, fundamentals of strokes, strategy of 
singles and doubles play, care and selection of equipment, terminol- 
ogy, flights and returns, and history. The difficulty ratings of the 
items range from 7 per cent to 93 per cent with an average for the 
battery of 50 per cent. Curricular validity was assured by the use 
of fourteen judges. The reliability coefficient reported is .921 when 
the battery is scored with corrections for guessing. T-score norms 
and percentile ranks are available for both beginners and inter- 
mediates. 

Baseball (Playground).5 This test consists of 100 true-false 
statements on game rules and strategy for fifth and sixth grade boys. 
A detailed study was conducted to determine difficulty of questions 
and the relationship between the skill and knowledge test; the 
correlation between the two was low. The correlation between 
chance halves (reliability coefficient) according to Spearman’s 
formula was r=.89. Such an examination will be found valuable 
for use as a teaching tool. 

Basketball. Schwartz’ basketball knowledge test which is 
scaled with her achievement test is described on page 198. 

An Appreciation Test in Dance. Murray? developed a pencil 
and paper test to measure space appreciation in dance. It can be 
administered to a group of beginning college students of any size 
within thirty or forty-five minutes. Because of the many dimensions 
of space as associated with dance movements, the test was restricted 
to the use of floor patterns. In the test the student rates ten floor 
patterns in order of their appeal to him for their interest and best 
use of space. 

Because of the wide diversity of opinions among experts an order 
of merit of the floor patterns was developed from the pooled opinions 


‘Phillips, Marjorie, “Standardization of a Badminton Knowledge Test for 
College Women,” Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., 


Vol. 17, No. h, 1946), 48-63. ы { 
Rodgers, E and rest Marjorie L., “Ап Experiment in the Use of 
Knowledge and Skill Tests in Playground Baseball,” Rer. Quart. dm. Phys. 
Educ. Assoc., Vol. II, No. 4 (December, 1951), 113-151. en 
*Schwartz; Helen, “Knowledge and Achievement Tests in Girls Basketball on 
the Senior High School Level,” Res. Quart. dm. Assoc. for Health, Phys. Educ., 
and Rec., Vol. VIII, No. 1 (March, 1937), 143-156. . А 
"Murray, Josephine Ketcik, An Appreciation Test in Dance. Unpublished 
master’s thesis, University of California at Los Angeles, 1943. 


9 


222 The Status of Measurement in Physical Education 


of 350 students enrolled in general dance classes of the University 
of California at Los Angeles. Intergroup correlations of the order 
of merit by groups of fifty students ranged from .759-+.094 to 
-976+.014. Intergroup correlations derived by groups of 100 
students were identical and came to .976-+.120. Results obtained 
in the retesting indicate a rather high degree of reliability, groups 
of fifty students agreed to the extent of .844 and .964; while 
the first 100 group correlated with its previous selection as high 
as .964. 

Field Hockey for Women.3 In setting up this test the author 
selected test items from three sources. The list of items is based on: 
(1) an analysis of previous tests in the field, (2) the official hockey 
rule book, and (3) the rating sheet of the written test given to those 
who wish to qualify as national umpires. Three equivalent exam- 
ination forms were constructed according to scientific procedures, 
and each item was validated according to the percentage of error of 
nationally rated and nonrated umpires. The reliability of each of 
the three forms was secured by correlating two equal halves on the 
basis of alternate items. These coefficients, ranging between .88 and 
-92, indicate a satisfactory degree of reliability. 

Golf.? The material for the test was selected from 200 items in 
six of the leading texts on golf and twelve articles. The test itself 
consists of fifty true-false statements on general technique informa- 
tion, thirteen completion statements under the heading of ‘Recall 
of General Technique,” and thirty statements requiring matching 
of terms on general information. The reliability coefficient was 
computed by Spearman’s formula for chance halves (r=.86). The 
original article has been supplemented by a furth 
the T-scale technique applied to the scores on tes 

Ice Hockey.!? Eighteen questions and true-fa 
offered, based upon the rules for ice 
made to validate the test and co 


er study showing 
t items. 

lse statements are 
hockey. No attempt has been 
mpute a reliability coefficient. 


8Grisier, Gertrude J., “The Construction of an Objective Test of Knowledge and 
Interpretation of the Rules of Field Hockey for Women,” Suppl.to Res. Quart. 


Am. Phys. Educ. Assoc., Vol. V, No. 1 (March, 1934), 79-81. 

9Murphy, Mary Agnes, “Criteria for Judging a Golf Knowledge Test,” Res. 
Quart. Am. Phys. Educ. Assoc., Vol. IV, No. 4 (December, 1933), 81-88. 
Murphy, Mary Agnes, "Grading Student Achievement in Golf Knowledge," 
Res. Quart. dm. Phys. Educ. Assoc., Vol. V, No. 1 (March, 1934), 85-90. 

!? Brown, Harriet M., “The Game of Ice Hockey," Jr. Health and Phys. Educ., 
Vol. VI, No. 1 (January, 1935), 28, 54-55. 


Knowledge and Information Tests i 223 


Soccer.!! As in their knowledge test for playground baseball, 
the authors have set up an examination consisting of 100 true-false 
statements regarding playing regulations and game situations. 
Lacking an objective criterion by which to validate the test it was 
necessary to rely on two other possibilities: (1) choice of material 
and (2) increase of accomplishment with successive ages. The 
coefficient of reliability of the test, determined by Spearman's 
formula for the correlation between chance halves, proved to be .903. 
А E positive relationship exists between the skill and knowledge 

ests. 

Soccer Rules.!? The Women’s Committee on soccer rules 
formulated a series of thirty-five questions for the purpose of giving 
to beginners a better understanding of the game. The questions are 
of three types — true-false, multiple-choice and completion — and 
are listed with the article. 

Swimming. 13 For the purpose of devising an objective exami- 
nation on swimming to be used to measure the level of understanding, 
classify students on the basis of knowledge and mark, Scott developed 
two written tests. The elementary form consists of thirty multiple- 
choice, and twenty-six true-false items, the intermediate form, 
twenty-two multiple-choice and thirty-six true-false items. Final 
selection of items was made on the basis of the difficulty rating and 
the index of discrimination for each item. Reliability coefficients 
of .888 for the Elementary Examination, and .867 for the Inter- 
mediate Examination are reported. 

Tennis. Besides offering a skill tes 


outlines a written test on tennis embracin, 
Positions and tactics, and good form in the strokes practiced. Ten 


sample questions of the multiple-choice type are offered. No attempt 
was made to validate the test but it appears to be very usable with 


beginning groups. 


t to beginners, Wagner!* 
g knowledge of rules, court 


"Heath, Marjorie L. and Rodgers, Elizabeth G., “A Study of the Use of Knowl- 
edge and Skill Tests in Soccer," Res. Quart. Ат. Phys. Educ. Assoc., Vol. I11, 
T ae 4 (December; 1932),25=55$.. 2 2 
nighton, Marian, "Soccer Questions, 
ux 8 (October, 1950), 29, 60. p 
cott, M. Gladys, “Achievement Examinations for Elementary and Inter- 
mediate Swimming Classes.” Res. Ва T Ld for Health, Phys. Educ., 
and Rec. XI, No. 2 (May, 1940), 100-111. 
иу s VoL ST NOUS лы of Grading Beginners in Tennis," 
Jr. Health and Phys. Educ., Vol. УІ, No. 5 (March, 1935), 24-25, 79. 


» Jr. Health and Phys. Educ., Vol. I 


224 The Status of Measurement in Physical Education 


From an original test of 200 items Hewitt!5 has constructed two 
forms of a comprehensive tennis knowledge test each with fifty 
true-false, multiple-choice, diagrammatic, completion, yes and no, 
and matching items. The reported reliability coefficient using the 
Spearman-Brown formula is .947 for the 100 questions. A .939 
correlation coefficient was found between this tennis knowledge 
test and Dyer’s Tennis Playing Ability Test, although other similar 
studies tend to show much lower correlations between knowledge 
and playing ability. 

Scott 16 developed two tests of tennis knowledge, one for elemen- 
tary and one for intermediate groups. Original items for the tests 
were selected on the basis of material considered important by 
members of the Research Committee of the Central Association ot 
Physical Education for College Women. Items for the final batteries 
were selected on the basis of a difficulty rating (percentage passing 
the item), and the index of discrimination which can be expressed by 
the formula (Means rights — Means wrongs). Those with a difficulty 
rating of more than 95 for the elementary battery and 96 for the 
intermediate battery were eliminated, as were those which did not 
meet a minimum discrimination index of 5.0. The final battery of 
the Elementary Tennis Examination includes twenty-five multiple- 
choice and forty-one true-false items, and the Intermediate Tennis 
Examination, twenty-one multiple-choice and thirty true-false items. 
Reliability for the batteries was determined by correlating the odd- 
even items and correcting for actual length by the Spearman-Brown 
formula. Reliability coefficients of -87 for the elementary examina- 
tion and .78 for the intermediate examinations are reported. In this 
study the author found that a student’s knowledge of tennis is not 
directly related to skill, and that skill and knowledge advancé at 
different rates. 

Information Tests in Health and Physical Education for 
High School Boys.!? The need for standardized information 
tests to measure methods and techniques of some phases of the high 


15Hewitt, Jack E., "Comprehensive Tennis Knowledge Test," Res. Quart. Am. 
Assoc. for Health, Phys. Educ., and Ree., Vol. VIII, No. 3 (October, 1937), 74-84. 

16Scott, M. Gladys, “Achievement Examinations for Elementary and Inter- 
mediate Tennis Classes," Res. Quart. Ат. Assoc. for Health, Phys. Educ., and 
Rec., Vol. 12, No. 1 (March, 1941), 40-49. 

17Hemphill, Fay, “Information Tests in Health and Physical Education for High 
School Boys,” Res. Quart. Am. Phys. Educ. Assoc., Vol. ПІ, No. 4 (December, 
1932), 85-96. 


Knowledge and Information Tests 225 


school physical education program for boys, gave rise to this rather 
detailed study. The phases of the program included in the study 
are: Q) major athletic activities (baseball, football, basketball), 
(2) minor sports (soccer, tennis, handball, volleyball), (5) health 
related to physical education, (4) self-defense (boxing and wres- 
tling), (5) recreational sports (golf, hiking, fishing and hunting, 
swimming, boating and canoeing, riding and horsemanship, camping 
and picnicking, horseshoes). 

Criteria were set up for use in preparing the preliminary tests 
and as a basis for selecting items in the final test forms. “The final 
form of the tests was found by experimentally validating the indi- 
vidual test items." The coefficient of reliability for the test was 
determined by the application of the Spearman formula for the 
correlation between chance halves — the r’s run from .666 to .877. 

The test questions (751 in number) are of two types: (1) true- . 
false statements, and (2) multiple-choice. Sample questions are 
offered as are also tentative grade norms for grades 8 to 11. 


The Minnesota Physical Education Knowledge Tests. 1° 
m in physical education knowledge test 


in the presentation to the profession of a 
very fine set of tests in ten different activities and hygiene, all 
properly validated and with reasonably high reliability coefficients. 
The activities in this broad testing program include: 


A comprehensive progra 
construction has resulted 


1. Archery 6. Hockey 
2. Baseball 7. Riding 
3. Basketball 8. Soccer 
4. Fundamentals 9. Tennis 
5. Golf 10. Volleyball 


11. Hygiene 


-five questions of the multiple-choice 


Each test is made up of forty 
hich eighty-five questions 


type with the exception of hygiene in w. 
are listed. 

The scores made on the knowledge tests are in most cases com- 
bined with scores made on practical tests in the same field and used 
as a basis of classification,” !? and to determine the semester grade. 


1 
8Developed by the Department of Physical 
of Minnesota, Catherine Snell, Chairman. 
‘Physical Education Knowledge Tests,” Res. Quart. Am. Phys. Educ. Assoc, 
Vol VI. No. 3 (October, 1955), 78-94; Vol. VIL, No. 1 (March, 1956), 75-82; 


1o Vol. УП, No. 2 (May, 1956), 77-91. 
Snell, Catherine, op. cit, Vol. VI, No. 5 (October, 1935), 78. 


Education for Women, University 


226 The Status of Measurement in Physical Education 


Knowledge Test on Source Material.?? This test is for use by 
individuals, both teachers and students, whose major interest is in 
the field of physical education, health or recreation. No attempt 
has been made to validate particular test items except the opinion 
of the writer but the test should be very useful in creating “interest 
in a working knowledge of source materials in physical education.” 
Test questions are listed under the following general headings and 
are weighted according to the opinion of the writer: (1) abstracts, 
(2) bibliographies, (3) book reviews, (4) dictionaries and encyclo- 
pedias, (5) foundations, (6) government agencies, (7) history, (8) 
indexes, (9) lists of names, (10) news notes and editorials, (11) peri- 
odicals, (12) professional associations, (13) publishers, (14) research 
and statistics, (15) special services. In all, 160 questions are asked 
with a scoring scheme having a possible 1000-point range. 

National Officials’ Rating Committee Tests.?! The National 
Officials’ Rating Committee of the National Section on Women’s 
Athletics provides knowledge tests in basketball, tennis, softball, 
and volleyball for use in rating officials in these sports. The tests 
are primarily concerned with material relative to officiating, and are 
available only to local rating boards. French? has constructed the 
officials’ knowledge rating test for the United States Field Hockey 
Association, also available only to local rating centers. 

Knowledge Tests for Activities in the Major Curriculum. 
French constructed sixteen knowledge tests covering the technique 
courses in the first two years of the professional curriculum for 
women majoring in physical education at the State University of 
Iowa. The activities included: 


1. Badminton 6. Folk Dancing 12. Stunts and Tumbling 
2. Basketball 7. Golf 13. Swimming 

3. Body Mechanics 8. Recreational Sports 14. Tennis 

4. Canoeing 9. Rhythms 15. Track and Field 

5. Field Hockey 10. Soccer 16. Volleyball 


11. Softball 


20Sefton, Alice Allene, “Knowledge Test on Source Material in Physical Education 
Including Aspects of Health Education and Recreation," Res. Quart. Ат. Phys- 
Educ. Assoc., Vol. VII, No. 2 (May, 1956), 124-156. М 

21National Section of Women's Athletics. Women's National Officials’ Rating 
Committee, "Officials" Written Tests in Basketball, Tennis, Softball, and Volley- 
ball, Mimeographed. Tests revised frequently. " 

22 French, Esther, "Knowledge for Field Hockey Officials,” United Slales Field 
"Hockey Association, (1939, 1940, 1941). Р 

23 French, Esther, "The Construction of Knowledge Tests in Selected Professional 
Courses in Physical Education," Res. Quart. dm. Assoc. for Health, Phys. Educ» 
and Rec., Vol. 14, No. 4 (December. 1945), 406-424. 


Knowledge and Information Tests 227 


АП tests were carefully constructed according to most advanced 
techniques in this area. The curricular validity of the tests was 
assured by careful review of course content including a review of 
course objectives. Each item in each battery was evaluated by a 
long list of rigid criteria. Items for the final batteries were selected 
on the basis of their difficulty rating and a discrimination index of 
A short form and a long form were 
Reliability coefficients range from 
.702 to .884 for the long forms, and from 619 to .878 for the short 
forms. Norms were established for the short tests. Tests such as 
these make a valuable contribution to the field, and the techniques 


(Means sights — Means wrongs) 
constructed for each activity. 


used warrant careful study. 


(B) In Health Knowledge, Habits and Attitudes 


Because of the increasing number of objective type tests in Health 
Education a synopsis of several follow to indicate the general type 
available, while others of equal worth are listed with their sources. 

Gates-Strang Health Knowledge Tests.24 These tests have 
a number of purposes, among which are: (1) to discover deficiencies 
in health knowledge, (2) to outline the facts which need to be taught 
various groups of children, and (3) to note the progress which pupils 
knowledge. The sixty-four items in each set 
rmulated in accordance with well-recognized 
criteria and include a wide range of topics, among which are foods, 
recreation, first aid and care of the body. The questions are of the 
multiple-choice type with five possible choices. Ап example will 


illustrate the type of statements. ^^ 
“24. One reason why green leafy vegetables are healthful is that 


they contain: 


are making in health 
of tests have been fo 


protein and sodium. 

starch and sugar. 

iron and vitamins. 

fat and potassium. 

carbohydrates and water." 

the tests are offered 


Directions for adm 
separately. 
24Gates, Arthur I. and Strang, Ruth, Gates- 
for elementary and high school). New York, 


College, Columbia University, 1956. М 
25Gates, Arthur I. and Strang, Ruth, op. сЁ, Form I, p. 4- 


Strang Health Knowledge Tests (Forms 
Bureau of Publications, Teachers 


228 The Status of Measurement in Physical Education 


Franzen Health Education Tests (American Child Health 
Association.)?9 After setting up criteria for evaluating the tests, 
material was gathered from seventy schools in various sections of 
the country representing various strata of economic welfare. The 
subject matter of the test was classified into fifteen divisions by four 
competent judges and a group of five tests comprise the final battery. 
These tests may be given singly, but when the entire battery is 
given the order indicated should be followed. Norms are available 
for grades 5 and 6. 

Wood-Lerrigo Health Behavior Scale.?7 While this is not in 
the nature of an educational test, it should be mentioned here because 
of the fact that it is an attempt to present the outcomes of health 
instruction. Standards of achievement in the form of six scales have 
been set up which indicate the health habits, attitudes and knowl- 
edge which are significant and should be acquired by children: 


. Before entering kindergarten. 

. At the completion of the third grade. 

. At the completion of the sixth grade. 

. At the completion of the ninth grade. 

. At the completion of the twelfth grade. 

. This scale lists standards appropriate for adults. 


O O1 i O1 to س‎ 


Each scale is divided into three general parts: (1) The Healthy 
Organism, (2) The Healthy Personality, and (3) The Healthy 
Home and Community. Each of these is further subdivided and 
appropriate habits, attitudes and knowledge standards listed. An 
example may be taken from Scale II:?8 


I. The Healthy Organism. 
(4) Nutrition 
Habitsor Skills... v TT 
6. Does not drink tea or coffee. 
Atitudes EL 
1. Enjoys eating in clean, neat surroundings in 
leisurely fashion. 


26Franzen, Raymond, Health Education Tests, School Health Research Mono- 
graph, No. 1. New York, American Child Health Association, 1929. Pp. xx 
d 70. 
27 Wood, Thomas D. and Lerrigo, Marion Olive, Health Behavior, Bloomington, 
Illinois, Public School Publishing Company, 1930. Pp. ix and 150 and 32. 
28 Wood, Thomas D. and Lerrigo, Marion Olive, ор. cit., рр. 39-40. 


———— ——— ————— 


Knowledge and Information Tests 229 


Knowledge: туте aren se ons 
2. Knows how much milk and water should be 


used daily. 


Health Knowledge Test for Adults. After a survey of 
sources of material, the authors formulated a preliminary test con- 
sisting of seventy-nine true-false, forty-six multiple-choice (five 
choices), and eighteen completion statements. The test was given to 
over 1000 freshmen and several groups of students and the results 
are shown in tabulation form. The average of the college freshman 
group on the test was 39 per cent. 

Health Knowledge Test for College Freshmen.?? As a first 
step in the study of the college freshmen’s knowledge of personal 
hygiene, Rooks set up a true-false type of examination covering 
twenty-one headings in the field of health knowledge, combined 
into four different tests of eighty items each. Two methods of 
validation were attempted: (1) results of trial tests, (2) expert 
medical opinion. The application of the tests indicated a decided 
lack in knowledge of personal hygiene on the part of college fresh- 
men, a “Jack in knowledge of simple structure and normal function- 
ing of the human body," 31 failure to discriminate between genuine 
and false advertising pertaining to health and disease, and a con- 
tinued adherence to fallacies in the field of health. 

Health Knowledge Test for High School Seniors and College 
Freshmen.?2 Kilander's test of health knowledge is designed for 
three groups, high school students, college students and adults. 
One hundred multiple-choice questions from the fields of nutrition, 
safety, first aid, community hygiene and sanitation, mental and 
common errors and superstitions, and general fields 
Questions for the final battery were 
selected on the basis of a difficulty rating derived so college students 
completing a course in hygiene could score 75 or better. Tentative 
norms are given for high school seniors and college freshmen. 


social hygiene, 
of health comprise the battery. 


29Forsythe, Warren E. and Rugeti, Mabel E., “A Health Knowledge Test,” Res. 


Quart. Am. Phys. Educ. Assoc., Vol. VI, No. 2 (May, 1935), 105-120. 
30Rooks, Roland, “The College Freshmen’s Knowledge of and Interest in Personal 
Hygiene,” Suppl. to Res. Quart. Ат. Phys. Educ. Assoc., Vol. VI, No. 3 (October, 
1935), 51-80. Excerpt from a doctor’s thesis, State University of Iowa. 


31Rooks, Roland, Zbid., p. 65. d 

32Kilander, Н. F., “Health Knowledge of High School and College Students, 
Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec, Vol. VIII, No. 3 
(October, 1937), 5-32. (Revised 1948) 


230 The Status of Measurement in Physical Education 


First Aid Test.?? Doscher has developed two seventy-six item 
batteries composed of multiple-choice questions (four-choice) based 
on the American Red Cross First-Aid Textbook. The reliability 
coefficients of the tests determined by the Spearman-Brown formula 
are .65 for Test A and .84 for Test B. The indices of reliability are 
.81 for Test A, and .94 for Test B. Difficulty ratings for each ques- 
tion are also given. 

Diet and Dental Health Test.?* Gold’s test of diet and dental 
health knowledge, in two equated forms, is designed for use in grades 
seven, eight and nine. Curricular validity of content was established 
on the basis of authoritative judgments, and validity was further 
indicated by increased norm score as grade advances. 

Health Practice Inventory.?5? Johns’ Health Practice Inven- 
tory may be used to measure health practices of an individual or 
group. It covers nutrition, excretion, exercise, posture, defense 
against disease, accidents, habit forming substances, personal and 
group maladjustments, and other aspects of personal hygiene. This 
type inventory is especially useful for pretesting in health education 
classes as a guide to teaching emphasis. : 

Health Attitude Scale.?9 Byrd's Health Attitude Scale is a 
one-hundred item battery designed to appraise the attitudes on 
health matters of senior high school and junior college students. 
The answers are scaled from one to five, with responses recorded in 
terms of agree, undecided, strongly agree, strongly disagree, and 
disagree. Items for the final battery were selected from an original 
400 items on the basis of their power to discriminate between the 
best 10 per cent and the poorest 10 per cent of 1700 subjects. This 
test, dealing with the complex problem of attitude measurement, 
provides a practical tool for teaehers working with new groups. 

Health Education Test: Knowledge and Application.?7 
Shaw and Troyer's test was constructed for use in hygiene courses 


38Doscher, Nathan, “Two First-Aid Examinations for College Students and Adult 
Groups,” Res. Quart. dm. Assoc. for Health, Pie Educ., and Rec., Vol. 14, No. 2 
(March, 1945), 228-248. 

34Gold, Leah, “А New Test of Health Kuosledge/ Res. Quart. Am. Assoc. for 
Health, Phys. Educ., and Rec., Vol. 16, No. 1 (March, 1945), 34-36. 

35Johns, Ned B., Health Practice Inventory. Stanford University, Stanford 
University Press, 1943. 

36Byrd, Oliver E., Byrd Health Attitude Scale. Stanford University, Stanford 
University Press, 1940. 

37Shaw, John H., Troyer, Maurice E. and Brownell, Clifford L., Editor, Health 
Education Test: Knowledge and Application. National ATER Tests. 
Rockville Center, New York. Acorn Publishing Company, 1947. 


Knowledge and Information Tests 231 


in grades seven through twelve, and the first year of college. There 
are two forms each consisting of 100 true-false and multiple-choice 
items. The forty true-false items deal with student reactions to 
problem situations, while the sixty multiple-choice items apply to 
important health facts and concepts. Topics covered include first 
aid, safety, disease prevention and control, lighting, ventilation, 
temperance, care of special senses, dental hygiene, social health and 
child care. Items were selected from the original experimental forms 
on the basis of a difficulty rating of from 10 to 90 per cent. Relia- 
bility coefficients of the two forms, determined by the split-halves 
method, are .90 and .91. 

Health and Safety Education Test.?? Crow and Ryan’s 
health and safety test has been constructed for use in grades three 
through six. This ninety-item multiple-choice battery covers 
material on good health and safety habits, cause and effect in rela- 
tion to health and safety, facts about health and safety, and appli- 
cation of health and safety rules. Norms are provided for each 
grade. The reported reliability coefficient is .90. 

Health Practices, Knowledge, Attitudes and Interests of 
Senior High School Pupils.33 These tests were developed in 
connection with a state-wide study of health education in the sec- 
ondary schools of Massachusetts. Each of the four batteries de- 
Signed to measure (1) practices, (2) knowledge, (5) attitudes, 
and (4) interests consist of ninety items. The per cent answering 
each item correctly is indicated, and the norm score for 10,000 
students given. 


Additional Tests in Health Education 


Boyer, P. A., Survey Test in Health Education. 6 Forms. Philadelphia, Penn- 
Sylvania: Public Schools, 1940. (Grades 4, 5, 6, 7, 9, and 10). 

Cottle, William E. and Moore, Fredrika, Zest of Health Awareness. Boston, 
Massachusetts Department of Public Health, State House, 1956. (High School). 

Denenholz, Sylvia O., "Knowledge Test of Syphilis and Gonorrhea”, Res. 
Quart. Ат. Assoc. for Health, Phys. Educ., and Rec., Vol. XI, No. 1 (March, 
1940), 110-114. (College Women). 


*8Crow, Lester D., Ryan, Loretta C. and Brownell, Clifford L., Editor, Health 
and Safety Education Test. National Achievement Tests. Rockville Center, 
as New York, Acorn Publishing Company, 1947. 
Southworth, Warren H., Latimer, Jean V. and Turner, Clair E., “A Study of 
the Health Practices, Knowledge, Attitudes and Interests of Senior High School 
Pupils," Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 15, No. 2 
(May, 1944), 116-136. 


232 The Status of Measurement in Physical Education 


Gill, E. M. and Schrammel, H. E., Gill-Schrammel Physiology Test. Emporia, 
Kansas: Kansas State Teachers College. (High School). 

Health Inventories. Cooperative Study in General Education, American Coun- 
cil on Education. Chicago, Illinois. 

Health Tests for Elementary School Children. Informally constructed batteries 
appear frequently in the following two sources. Some of the later ones are 
indicated. While not standardized they serve to suggest test items to the 
teacher. Grade Teacher, Vol. 64, No. 9 (May, 1947), 74; Vol. 65, No. 4 (De- 
cember, 1947), 67; Vol. 65, No. 6 (February, 1947), 60; Vol. 65, No. 7 (March, 
1948), 100; Vol. 65, No. 9 (May, 1948), 62. The Instructor, Vol. 50, No. 9 
(September, 1941), 27; Vol. 51, No. 7 (May, 1942), 17. 

Hemphill, Fay, “Information Tests in Health and Physical Education for 
High School Boys," Res. Quart. dm. Phys. Educ. Assoc., Vol. III, No. 4 (De- 
cember, 1932), 85-86. These tests are described on page 224. 

Neher, Gerwin, Zeallh Inventory for High School Students. Los Angeles Test 
Bureau, Board of Education, 1944. 

Orleans, Jacob and Sealy, Glenn A., Public School Achievement Tests: Health, 
Forms l and 2. Bloomington, Illinois: Public School Publishing Company, 
1928. (Elementary Grades 4 to 8). 

Schrammel, H.E. and Brewer, John W., Brewer-Schrammel Health Knowledge 
and Attitude Test. 2 Forms. Emporia, Kansas; Kansas State Teachers College, 
1935. (Grades 4 to 8). 

Speer, Robert and Smith, Samuel, National Achievement Tests: Health Test. 
2 Forms. Rockville Center, New York: Acorn Publishing Company, 1938. 
(Grades 3 to 8). 1 

Trusler, V. T. and others, Zrusler-drnelt Health Knowledge Test, Emporia, 
Kansas: Kansas State Teachers College, 1940. (High School). 


Selected References 


The footnote references in this chapter are all excellent for tests of the types 
indicated. Practically all of the contributions have been recent and the majority 
of the tests have been formulated with particular attention to correct pro- 
cedures in test construction. It seems fair to state that as much attention has 
been given to this phase of the measurement program in the past few years as to 
the measurement of achievement in skills and general qualities. 


Bann, A. S., Burton, У/плллм Н. and Bnurckwzn, LEO J.: Supervision, New 
“York, D. Appleton-Century Company, 1947. Pp. 879. 

Chapter V, “The Appraisal of the Educational Product”, includes material 
on the selection and use of objective type tests. It also contains a copy of the 
Cole-von Borgersrode Scale for Rating Standardized Tests. 

Tue ForTY-FIFTH YEARBOOK or THE NATIONAL Society ror THE STUDY OF 
Epucation, Part I — The Measurement of Understanding. Mabel E. Rugen, 
and Dorothy Nyswander, “Тһе Measurement of Understanding in Health 
Education," pp. 215-219. Chicago, Illinois, The University of Chicago Press, 
1946. Pp. 538. 

Outlines the scope of the objectives of health education, and describes avail- 
able types of measurement tools. 


— — 


Knowledge and Information Tests 233 


FRENCH, ESTHER: "The Construction of Knowledge Tests in Selected Professional 
Courses in Physical Education,” Res. Quart. dm. Assoc. for Health, Phys. Educ., 
and Rec., Vol. 14, No. 4 (December, 1945), 406-424. 


Rovcers, EnzanETH G.: “The Standardization and Use of Objective Type 
Information Tests in Team Game Activities,” Res. Quart. dm. Assoc. for Health, 
Phys. Educ., and Rec., Vol. X, No. 1 (March, 1939), 102-112. 


Hewirt, Jack E.: “Improving the Construction of the Essay and Objective New 
Type Examination,” Res. Quart. dm. Assoc. for Health, Phys. Educ., and Rec., 
Vol. X, No. 3 (October, 1959), 148-154. 

These three articles deal with the procedures, including statistical techniques, 
for constructing objective type written tests in physical education. 


Greene Harry A. JORGENSEN, ALBERT N. and Gerpericn, J. RAYMOND: 
Measurement and Evaluation in the Secondary School, Chapter V and Chapter 
VIII. New York, Longmans, Green and Company, 1945. Pp. 670. 

These chapters point out the characteristics of informal objective examina- 
tions, their advantages and disadvantages, principles underlying construction, 
and illustrations of types of objective items. 

Parry, W. W.: “Reading Difficulty of Health Knowledge Tests," Res. Quart. dm. 
Assoc. for Health, Phys. Educ., and Rec., Vol. 16, No. 3 (October, 1945), 206-215. 

Reports a study on the word level difficulty of two standardized health 
knowledge tests, and urges that other than for scientific terms vocabulary for 
tests be selected from that common to pupils. 

Косн, G. M.: The Objective or New Type Examination. Chicago, Scott, Foresman 

and Company, 1929. Pp. x and 478. 
k Part I, ‘The Argument for Objective Examinations,” pp. 3-148. Of particular 
importance in this part of the book are chapters on the following topics: criteria 
of a good test, advantages and limitations of objective examinations, and 
relative values of standardized and nonstandardized tests. 

Part II, “How to Construct an Objective Examination,” pp. 149-278, offers 
complete information regarding the building of an objective test, sets forth 
rules for drafting objective test items, and gives many illustrations of types of 
objective tests. 

Part III, "Experimental and Theoretical Considerations," pp. 281-404. 
Besides a summary of the experimental studies on the merits of various types 
of objective test items, this section contains an excellent chapter on “Ехатіпа- 
tions, Marks and Marking Systems." 

To complete the treatise on objective examinations, the author offers Chapter 
XV, "Statistical Problems Related to Measurement," pp. 405-445. Here are 
presented the seven basic statistical concepts necessary in securing a reasonable 
mastery of the principles of test interpretation. 

Scorr, M. Grapvs and Екемсн, Estuer: Better Teaching TI hrough Testing. New 
York, A. S. Barnes and Company, 1945. Pp. 179-205. 

The Chapter on “Construction of Knowledge Examinations” applies latest 
Principles in this area directly to problems of constructing knowledge tests in 
Physical education. Includes also material on scaling, scoring, and revising 

е examinations. 

TINKELMAN, SHERMAN: Difficulty Prediction of Test Items. New York, Teachers 
College Bureau of Publications, Columbia University, 1947. Pp. 55. 

Discusses the problem of determining difficulty rating, which so influences 
the validity of test items, when experimental tryout of test batteries is not 
feasible. 


CHAPTER XII 


Rating Scales 


Recalling the concept that measurement and evaluation must be 
done in terms of the objectives sought should make the student 
cognizant of many factors in physical education not readily appraised 
by the available tests described thus far in this text. Good form in 
sports and rhythmic activities is a major teaching emphasis. Proper 
attitudes toward activity, positive social behavior, and habits of 
good sportsmanship, all become objectives in the well-rounded 
physical education program. In common practice most teachers 
make judgments about these factors; simple ratings, an estimation 
of qualities or abilities, are continuously used as a part of the teach- 
ing process. Rating scales, measurement tools, serve to increase the 
objectivity of these necessary subjective judgments. Simply stated, 
a rating scale is a device to assist the teacher or judge to know what 
he is looking for, and to describe more precisely the degree to which 
the quality under examination is present. 

Rating scales are by no means a new innovation in the physical 
education program. Many of the skill tests in physical education 
have been validated on the basis of the combined judgment ratings 
of experts. In fact it can be said that all measurement in its inception 
relies on the opinion of experts. For example, the first intelligence 
test was validated against the judgment of experts as to the relative 
mental abilities of the children involved in Binet’s original experi- 
ment. 

Rating scales are more widely used in physical education testing 
programs than would be indicated by the number of these scales 
appearing in the literature. Most such scales have been prepared 


234 


le x — 


Rating Scales 235 


for local use, and play an increasingly important part in evaluation 
and grading plans. The recognized use of rating scales is due in part 
to a better understanding by more teachers of the over-all objectives 
of the physical education program, and of the need to appraise by 
the best possible means some of the more intangible objectives. In 
the measurement of facets of human behavior for which adequate 
measurement tools are yet lacking, rating scales serve to bridge the 
gap between wholly objective measurement and random subjective 
opinion. Rating scales, properly constructed and accurately used, 
serve as a useful measurement tool in the physical education pro- 
gram. 

Construction and Use of Rating Scales. While not applicable 
to all types of rating scales, in general a rating scale is constructed 
first, by selecting a trait to be measured; second, by determining the 
range or continuum of the scale; third, by isolating and defining the 
basic elements of the trait under consideration; and fourth, by 
distributing the component elements along the scale. To repeat 
this sequence in terms of specifics: first, one selects the trait to be 
rated: form in bowling. Second, the range or continuum of the scale 
is determined: shall he be rated pass or fail; by class such as excel- 
lent, good, fair, poor; scored from 1 to 5 or 1 to 10 or more; or merely 
checked for positive and negative aspects of the trait; and the like. 
Third, the trait is defined: what precisely is good form in bowling 
in terms of initial stance; grip on ball; approach, considering foot 
Work, arm swing, body position, eye focus, ànd coordination; and 
release. Fourth, the elements of the trait are arranged in terms of 
the selected range of the scale: what specifically characterizes an 
"excellent" bowler or a “10-point” bowler; what does a “poor” 
rating mean in terms of specific elements of bowling form; or how 
many points shall be given for each variable defined? 

There are several different types of rating scales. Described 
following are forms which appear to be most useful in terms of the 
Physical education program: 

1. A descriptwe or quality scale includes a list of descriptive phrases 
outlining the variables of a given trait. The rater merely checks 
those variables which best describe the subject. This type scale is 
widely used as a teaching or guidance device. 

2. A class rating scale designates classes for a given trait by such 
Symbols as excellent, superior, good, fair, poor; A, B, C, D, E; pass 
ог fail; 4, 5, 2, 1; or by any other definitive symbols. The subject is 


236 The Status of Measurement in Physical Education 


placed in a general classification which best describes him in relation 
to the trait. In this type scale the elements of the trait are carefully 
defined in relation to the separate classes, e.g., precisely what skills 
does an “excellent” performer in the crawl stroke possess; what 
characterizes a swimmer rated “poor”? 

3. A man-to-man rating scale is closely related to the class rating 
scale. It was developed for rating officers in the Army during World 
War I. Five categories from Highest to Lowest are set. An indi- 
vidual known to the rater who characterizes each position is placed 
at the standard levels. Others in the group are scored by comparison 
to this model. 

4. A comparison scale offers a standard pictorial scale to which 
the subject specimen is compared. Handwriting scales, and the 
posture scales described in Chapter III, illustrate this method. 

5. A graphic rating scale depicts elements of other type scales in 
diagrammatic form enabling the rater to better visualize the range 
and details of the scale. An example of a class rating scale presented 


in graphic form on which the rater merely checks the appropriate 
segment follows: 


/. Poor _/ Fair / Average / Good / Superior / 


The definitive qualities of each class can be listed under each seg- 
ment. . 

6. A numerical rating scale is also a variation on other type scales. 
Here a score value is applied to the variables or the points on the 
scale. A range of values is selected such as 1 to 5 or 1 to 10 with the 
highest number usually representing maximum possession of a trait. 
The rater assigns the number or score which best indicates the 
amount of the trait possessed. . 

Various combinations of these rating forms can be made. Also 
certain mathematical considerations guide the assignment of 
numerical values to the scales. The sources listed should be con- 
sulted for detailed procedures on the construction and use of rating 
scales.! A number of principles regarding their use, however, are 
suggested. 

1. Ratings should be made on the basis 


: of a single trait at a time, 
whenever possible. 


In rating swimming form, for example, each 


1See Selected References at end of chapter. 


Rating Scales 237 


stroke or water safety event should be rated separately. Combined 
judgments can then follow. 

2. The rater or judge should be expertly qualified. The rater 
must be able to recognize good form in swimming, for example, when 
he sees it. Merely being familiar with the elements of the rating 
scale is not enough; his experience with those elements determines 
the reliability of his judgment. Similarly in rating behavior traits, 
personality and the like, the rater must know the student to be 
judged sufficiently well in terms of the trait under consideration. 

3. The rater must be unbiased. Prejudicial feelings, whether 
Positive or negative, about the subject can condition the judgments 
made if proper precautions are not taken. This includes the concept 
of the “halo effect” in which strong feelings regarding a related trait 
or general qualities of the subject unduly influence the judge in his 
rating of the trait under consideration. An example of this is the 
teacher giving an “excellent” rating for swimming form to the 
student, who despite marked weakness in form, has been helpful 
about the pool, or by some other trait distinguishes himself in the 
rater’s estimation. This principle can obviously operate in reverse, 
and low rating given for negativeness in other aspects of the student’s 
ability than the trait under consideration. Teachers especially must 
be cognizant of this problem and endeavor to approach ratings with 
an objective frame of mind. 

4. The rating should approximate the time of the observation of 
the performance to be judged. In rating form in physical education 
activities the rating should be made immediately after the activity 
is performed, and the rating period planned much under the same 
Circumstances as for any other test in the physical education pro- 
gram. 

5. Standardized or pre-prepared rating sheets, preferably of the 
graphic type, should be used to insure uniformity and administrative 
facility, 

6. The rater should check. his basic criteria or definitions fre- 
Ччеп у to assure himself that the same yardstick is being applied 
to each subject. 

. 7. Several raters should be used whenever possible and their 
Judgments combined for final rating. This procedure increases 
reliability. 

8. The range for a given scale should be determined on the basis 
of the purpose of the rating and the administrative limits of the 


238 The Status of Measurement in Physical Education 


situation. The range for rating of skill for screening purposes in a 
beginning tumbling class might well be pass or fail, a 2-point scale. 
For a final mark in the course, the range could well be increased to 
a 5 or more point scale, but it must be kept within the limits of easy 
handling and meaningfulness for the rater. 

9. In most cases in rating of physical education activities the 
student being rated should know the details of the rating device. 
Similarly self-ratings and ratings by other students also serve as a 
valuable teaching device by assisting the student to understand the 
objectives of the activity. 

10. Ratings may be weighted for difficulty. The best example of 
this occurs in the official competitive diving rules, where each dive 
is given a difficulty rating depending upon its complexity. The 
competitor's final rating or score becomes his initial rating on the 
dive, times the difficulty rating of the dive. Thus, more points can 
be earned in performing the backward dive with full twist than the 
straight backward dive. 

1l. The validity of ratings should be considered in drawing 
conclusions from the results. Some elements or traits are more 
validly rated than others. For example, leadership, efficiency, 
judgment, quickness and energy lend themselves to more reliable 
ratings than such traits as honesty, courage, tact, cooperativeness 
and unselfishness.? Good results may be expected from ratings of 
form in physical education activities if the ratings are carefully 
made. 

12. The limitations of rating scales must be clearly recognized. 
They should not be used when more valid, reliable, and objective testing 
procedures are available to measure the trait concerned. 


Activity Rating Scales 


The Rating of Player Performance in Basketball. Elbel and 
Allen? have developed a w 
more accurately evaluating team and individual performance in 
basketball competition. Ten positive items, /. e., field goals, free 
throws, assists, and the like, and nine negative items, /. e., errors of 


?Remmers, Н. Н. and Gage, N. L., Educational Measurement and Evaluation, 
р. 566. New York, Harper and Brothers, 1943. 
3Elbel, E. R. and Allen, Forrest C., "Evaluating Team and Individual Perform-* 


ance in Basketball," Res. Quart. Am. Assoc. for Health, Phys. Educ., and Ree., 
Vol. 12, No. 3 (October, 1941), 538-555. 


eighted rating scheme for the purpose of , 


ےی 


Rating Scales 239 


omission, offensive fouls, violations and the like, are weighted from 
one to ten. А tabulator observes each player throughout the game, 
and scores him in terms of the items on the chart. At the end of a 
game or season a total score for each individual may be computed by 
subtracting the summed negative weighted scores from the summed 
positive weighted scores. Critical analysis of the scoring on indi- 
vidual items reveals significant strengths and weaknesses of players, 
and has proved to be a valuable coaching device for those using the 
plan. 

Voltmer and Watts* also suggest a rating scale of player ability 
in basketball. Their scheme requires a comparatively few recorders, 
eliminates opinion of scorers as much as possible, and presents an 
adequate picture of actual performance during the game. Players 
are scored on ten different performances, five of which produce points 
"for" and five of which produce points "against." A “for” score 
card and an " against" score card are provided which accommodate 
records for the entire team. Two scorers keep these records, and 
information on field goals and free throws made and fouls committed 
is gleaned from the official scorebook. A summary chart is provided 
to better illustrate each individual’s record. The authors feel that 
this plan is especially valuable in motivating players to improve, 
helping to diagnose strengths and weaknesses, ruling out bias in 
player selection, and evaluating total scores for the entire team in 
Separate games 

Diving Rating Scales. The classic rating scale in the sports 
field is that of the Official Diving Rules of the National Collegiate 
Athletic Association. Individual dives are scored on a 10-point 
scale, with numbers from 6 to 10 scored at half point intervals if 
necessary. The class ranges of the numbers are: completely failed, 
Unsatisfactory, deficient, satisfactory, good and very good. The 
dives are analyzed in terms of the run, take-off, technique and grace 
In the air, and entry into the water. Dives for various types of 
Competition are divided into compulsory and optional groups, and 
each is given a difficulty rating. Complete directions for conducting 
and scoring meets are given in the listed source. To further aid the 


“Voltmer, E. F. and Watts. Ted., “A Rating Scale of Player Performance in 
basketball," Jr. of Health and Phys. Educ., Vol. XI, No. 2 (February, 1940), 
-95 


National Collegiate Athletic Association. The Official Swimming Guide, 
ficial Rules for Swimming, Fancy Diving, and Water Polo,” pp. 164-189. 
New York, A. S Barnes and Company. 1947. 


240 The Status of Measurement in Physical Education 


judge a pictorial chart showing correct positions of head, arms and 
legs for a number of springboard and platform dives is shown. 

Bennett? suggests a fifty-item test of diving, which ranges from 
simple stunts to difficult dives. The final score for the test is the 
number of items passed according to the standards given for each 
item. An unlimited number of trials is allowed for each event. 
Validity was determined by comparing the results on this test with 
the results for a number of subjects scored by experts according to 
official diving rating rules. A validity coefficient of .95 was obtained 
by this method. A reliability coefficient of .90 was found when the 
odd and even items of the battery were correlated. 

Ratings in Riding Competition.? Crabtree suggests a rating 
plan for scoring individuals and teams in riding competition. The 
individual score chart includes nine items, such as mounting, dis- 
mounting, walk, trot, collected trot, each of which is subdivided 
into additional elements and assigned point values, for a maximum 
of 40 points. The team score card rates three items, walk, trot, and 
canter for a total of 15 points. The plan was devised empirically by 
experts and has undergone several revisions to increase the ease of 
its use in judging contests. 

Form Diagnosis Sheets. Trumbull describes а plan for devel- 
oping and using Form Diagnosis Sheets in a variety of sports. The 
basic elements of good form in the activity concerned are listed on 
a mimeographed sheet. Students are rated early in the term, and 
given a copy of the rating sheet so individual weaknesses may receive 
special attention. A second rating is made at the end of the term, 
and improvement noted. This plan represents the simplest form of 
ratings, and while more technically developed scales are needed, 
such ratings serve as a valuable teaching aid. 

Some Sample Rating Forms.’ Scott and French describe 
procedures for constructing rating scales, and include sample rating 
sheets or lists of elements of good form for use in building rating 


Bennett, LaVerne Means, “A Test of Diving for Use in Beginning Classes,” 
Res. Quart. Am. Assoc. for Health, Phys. Educ., and Rec., Vol. 13, No. 1 (March, 
1942), 109-115. 

7Crabtree, Helen Kitner, “An Objective Test for Riding,” Jr. of Health and Phys. 
Educ., Vol. 14, No. 8 (October, 1943), 419. Y COMO rey 

STrumbull, Katherine, “Form Diagnosis," Jr. o Heallh and Phys. Educ., Vol. 15; 
No. 3 (March, 1944), 149. HR EN dacs Ve 

9Scott, M. Gladys and French, Esther, Better Leaching Through Testing, Chapter 


VII, "Achievement Ratings and Progression," pp. 154-176. New York, A. S. 
Barnes and Company, 1945. 


Rating Scales 241 


scales for diving, swimming, posture, dance, softball batting form, 
basketball, tennis, and in the achievement progressions of swimming, 
and stunts and tumbling. Numerical rating scale techniques can be 
applied to achievement progressions by giving a numerical value to 
each step in the progression. 

Self-Rating of Physical Fitness According to Definite 
Standards.!? This rating scheme was devised for use at the 
McKinley Y.M.C.A. in Champaign, Illinois. The form includes 
twenty items listed under the three headings of Physique, Organic 
Efficiency, and Motor Fitness. Standards are set in each of the 
twenty items, and a 5-point scale indicates the degree to which the 
standard is approximated. 

Rating of Sports Officials (Women).!! In Chapter XI the 
knowledge tests! ? provided by the Officials’ Rating Committee ot - 
the National Section on Women’s Athletics were described. Accom- 
Panying these knowledge tests in basketball, tennis, softball and 
volleyball are standard rating sheets by which candidates for 
officials’ ratings are judged on their officiating ability under game 
situations. The ratings scales are based upon 100 points. To obtain 
a national rating a minimum of 85 points must be scored on both 
the practical rating and the knowledge test. 

Cureton’s Multiple Rating Scales.!? Cureton presents norm- 
ative tables developed from data gathered on men in college fitness 
classes for a large number of physical fitness items. Norms are 
given on the basis of five classes: A+, Excellent; A, Very Good; 
B, Above Average (Good); C+, Average; C, Below Average; 
D, Poor; Failing, Very Poor. Multiple Rating Scales are available 
for the following categories, each of which includes a number of 
Separate items: Endurance Running Events; Muscular Endurance 
Events; Balance, Agility, Power; Systolic Blood Pressure; Cardio- 
vascular Measurements and Indices; Respiratory Tests; Flexibility 

€asurements; Fat Measurements; Muscular Development and 
10The Forty-Fifth Yearbook of the-National Society for the Study of Education, 
Par tI, The Measurement of Understanding, “The Measurement of Understand- 
Ing In Physical Education,” pp. 252-250. Chicago, Illinois, The University of 
Chicago Press, 1946. 

National Section on Women’s Athletics. Women's National Officials" Rating 
Committee, “Officials Rating Scales in Basketball, Tennis, Softball, and Volley- 
no Mimeographed. ; 

ее page 226. 

Cureton, Thomas K., Jr., Physical Fitness Workbook, pp. 127-139. St. Louis, 

The C. V. Mosby Company, 1947. : 


li 


242 The Status of Measurement in Physical Education 


Strength; Posture and Foot Measurements; Chest Measurements; 
and Skeletal Measurements and Weight. 


Program Score Cards 


A score card is a form of numerical rating scale. Those designated 
to measure the over-all effectiveness of programs, or parts thereof, 
generally list many factors concerned with program organization 
and administration. Weighted numerical scores, usually determined 
by expert opinion, are assigned to each factor. The scores can then 
be summed for a final total. Score cards serve a useful purpose in 
indicating where definite administrative lacks may be found. 
Obviously, though, the presence or lack of given material aspects of 
program, facilities and the like is not an accurate method of judging 
program worth, since the only final criterion must be the effect of 
the program on pupil growth and development. While the impor- 
tance of the total score aspect of program score cards is minimized, 
they still serve a useful purpose in calling attention to factors which 
may be adversely affecting the conduct of the program. 

Score Cards for Secondary School Physical Education 
Programs for Boys and Girls. Two score cards were developed 
by the Division of Health and Physical Education of the California 
State Department of Education for use in evaluating high school 
physical education programs for boys!4 and girls.!? Recognition 
was given to the fact that pupil achievement is the best way to 
evaluate programs, but recognizing the difficulties of adequately 
measuring this, score cards were developed to measure those factors 
which appear to be closely correlated with pupil achievement. The 
score cards include a large number of variables under such headings 
as instructional staff, facilities, program organization, program 
activities and professional assistance. Each variable has a score 
value, each heading a total value, and a final score with a maximum 
of 2000 points is computed by summing individual scores. The 
score cards were designed as a check list to assist local schools in 
locating particular areas where improvement might be indicated, 


14 4 Score Card for Evaluating Physical Education Programs for High School Bays. 
Bulletin No. E-2. California State Department of Education, Division of 
Health and Physical Education, 1951. 

154 Score Card for Evaluating Physical Education Programs for High School Girls. 
Bulletin No. E-3. California State Department of Education, Division of 
Health and Physical Education, 1931. 


m 


шы, 


Rating Scales 243 


and it is recognized that the value of the score cards lies herein, 
rather than in the magnitude of the total score. 

Score Cards for Elementary and Secondary School Health 
and Physical Education.!® These score cards were designed to 
direct attention to the characteristics of good programs of physical 
education for both elementary and secondary schools, and to provide 
local schools a somewhat objective basis by which to compare their 
offerings with recommended practices. The purpose is to disclose 
significant weaknesses rather than to rate schools. The cards were 
developed in connection with the nine year study by the Committee 
on Curriculum Research of the College Physical Education Asso- 
ciation. Preliminary material for the score cards was submitted 
to a large jury of leading state, city and rural administrators and 
Supervisors throughout the country. The score cards contain 
items on the most significant factors of the activity program. 
Outdoor areas, indoor areas, organization and administration of 
class programs, and medical examinations and health service. 
Each item is given a weighted score, and total score standards 
аге indicated. 

A Check List for the Survey of Health and Physical Edu- 
cation Programs in Secondary Schools.!? This check list 
includes the important elements of the school health program 
including medical service, health instruction, hygiene of the en- 
Vironment and of the school program, and the physical educa- 
tion program. The items are classified, and the list can be used 
to record the status and existence of equipment, procedures and 
activities. 

Score Card for Y.M.C.A. Physical Education Programs. ! 8 

aters combines a score card of program details and professional 
Standards for personnel engaged in physical education programs of 
the Young Men's Christian Association. The major headings include 
organization and administration, boys’ program, men’s program, 
Maintenance, and professional qualifications and standards of con- 
duct, The total point value of the scale is 1197 points. 


1 
LaPorte, William Ralph, Chr., Health and Physical Educalion Score Card. 
o 1 — For Elementary Schools. Los Angeles, California, The University of 
E alifornia Press, 1938. (see also: No. II — For Secondary Schools). 
earborn, Terry H., 4 Check List for the Survey of Health and Physical Education 
rograms in Secondary Schools. Stanford University, Stanford University Press. 
Waters, M. L., Appraising Physical Education Programs in the YMCA. New 
ork, The Association Press, 1944. 


244 The Status of Measurement in Physical Educatton 


Score Card for Physical Education Programs for Physically 
Handicapped Children.!? This score card was developed to aid 
in the evaluation of physical education programs for handicapped 
children in public school systems. The major headings considered 
are administration, personnel, services and activities. Each heading 
is described by a list of statements. Three points are given for 
maximum achievement of the standards described by each state- 
ment, 2 points for good to average, 1 point for fair to minimum, and 
no points if the requirement is not met. The maximum score is 102. 
Arbitrary standards of superior, good, fair and poor are indicated. 


Behavior Rating Scales 


The development of social efficiency is a recognized objective of 
physical education, yet tools to appraise this objective accurately 
are meager. Few scales, directly concerned with measuring behavior 
in physical education, are available. 

McCloy's Behavior Rating Scale.?? Approaching the problem 
of character building from the philosophical standpoint, McCloy 
has outlined ““а process of seeking specific character developments 
through planned-for learnings, direct, associate and concomitant." 
From a study of the objectives of the individual and of the teacher, 
a tentative selection of character objectives has been made, grouped 
under nine general headings. The behavior rating scale takes into 
account the frequency of observation of the particular quality in 
question as well as the assurance with which the rater forms his 
judgment. The nine general headings used in the scale include: 
(1) leadership, (2) positive or active qualities such as initiative, 
courage, decision, etc., (3) attitudes, (4) self-control, (5) coopera- 
tion, (6) sportsmanship, (7) ethical qualities, (8) qualities of 
efficiency and (9) sociability. In all, thirty-seven items or qualities 
are rated. 

O’Neel’s experimental study?! of behavior traits was based on 
19Landers, Julia Partington, “А Score Card for Evaluating Physical Education 

Programs for Physically Handicapped Children," Res. Quart. Am. Assoc. for 

Health, Phys. Educ., and Rec., Vol. 16, No. 3 (October, 1945), 216-220. 
20McCloy, Chas. H., “Character Building Through Physical Education," Res- 

Quart. Am. Phys. Educ. Assoc., Vol. I, No. 3 (October, 1930), 41-61. 
210'Neel, F. W., “A Behavior Frequency Rating Scale for the Measurement of 

Character and Personality in High School Physical Education Classes for Boys,” 


Res. Quart. Am. Phys. Educ. Assoc., Vol. VIL, No. 2 (May, 1936), 67-76. A 
master’s thesis at the State University of Iowa, 1931. 


Rating Scales 245 


McCloy’s22 original list and to this list was added other traits 
making a total of fifty items. The test items finally selected have . 
low reliability and the proposed scale is not offered as a reliable 
measure of character. 

Blanchard, 23 in an experimental study, made a critical analysis 
of the McCloy Behavior Rating Scale, modified the original trait 
actions and proposed a new rating scale which includes the nine 
general headings but only twenty-four items or (trait actions). In 
the new scale, the rater's assurance column has been eliminated, the 
reliability of the test items has been improved and the validity 
increased. 

In a further study by McCloy ?* using an analysis by Spearman's 
T wo-Factor Method and the Thurston technique of factor analysis, 
the important common or fundamental elements in character are 
reduced to three — positive self-feeling, the soctal factor, and the 
factor of individual qualities. It appears that “these three common 
elements may be very accurately measured by a number of properly 
combined ratings, assuming, of course, that the individuals are well 
known to the raters and that enough raters are used to validate the 
results to within a reasonable degree of accuracy." 25 McCloy 
believes that the physical educator should not attempt to rate a 
great mass of detail in connection with character traits but rather 
instead rate development in these three fundamental qualities. 

A Technique for Measuring Sportsmanship.?9 Lauritsen 
Suggests a method of appraising ability to recognize and apply 
Principles of sportsmanship. The technique involves the presenta- 
tion of problem situations in eight different sports. Various courses 
of action are listed from which those taking the test are asked to 
select the one in which they are in agreement. This is followed by 
listed reasons, from which selections are made to justify the course 
of action selected. Valuable clues regarding student attitudes can 
be noted from a test of this type, but such a procedure is subject to 
aa МеСіоу, C. H., op. cit. ^ 3 

lanchard, B. E., Jr., “A Behavior Frequency Rating Scale for the Measure- 


ment of Character and Personality in Physical Education Classroom Situa- 
tions,” Res. Quart. 4m. Phys. Educ. Assoc., Vol. VII, No. 2 (May, 1936), 56-66. 


‚ ^ McClo , C. H., “General Elements in Character," Supp. to Res. Quart. 4m. 


TOM Educ. Assoc., Vol. VI, No. 3 (October, 1955), 99-109. 
td., p. 107. 
Е Lauritsen, William H., Some Techniques for Measuring Achievement of Objectives 
in Physical Education, Unpublished doctoral study, Columbus, Ohio, Ohio State 
niversity, 1939 


300 Tools of Measurement 


Display of Comparison. Histograms and bar displays are useful 
in making clear comparative statistics or records such as scores 
made by women compared to scores made by men or for results 
obtained different years in the same event. This method is less 
confusing than the method of superimposing one frequency polygon 
on another. 


Freshman Sophomore Junior Senior 


Clear Area = Majors in Physical Education 1942 
Shaded Area = Majors in Physical Education 1945 
Fig. 18. Showing comparison in enrollment of major students in a selected 
college for the years 1942-43. 


The Ogive Curve or Percentile Graph. It is very often desir- 
able to show by means of a diagram the number of cases or per cent 
of the population in a distribution lying above or below a given 
mark. In order to plot an ogive curve the scores in the distribution 
must be added in a cumulative fashion as shown in Table XVIII 
(Data on Standing Broad Jump, from a college freshmen group). 

The method of laying out the ogive curve may be described as 
follows: 

1. Divide the X-axis into divisions to be used as the intervals of 
the distribution. Each cumulative frequency interval is to be 
plotted not as representing the midpoint of the interval but as the 
upper limit of the interval because of the fact that the interval in 
question includes all cases up to the beginning of the next succeeding 


interval. г 
2. Lay off the Y-axis into divisions to be used in representing the 


cumulative frequencies. 
5. Starting at the first interval to the right of the origin, mark in 


order the points of the curve according to interval and cumulative 
frequency at each interval. 


Elementary Graphical Methods 301 


4. Join the points by a French curve. Some smoothing may be 
necessary where the curve is irregular. 

5. Erect a perpendicular on the X-axis at the upper limit of the 
last interval and continue it until it reaches the curve. Divide this 
line into ten equal parts to locate the 10 percentiles of the distribu- 
tion. 


eee eee 


Сита Frequency Scale, 


1 


sa ареалга E 
' 


[e] a3 55d @ d qn 77 7° 6! ef 8? e* 
Standing Broad Jump in Sect and inches 


Fig. 19. Ogive curve from the data of Table XVIII. 


Such a curve as the one illustrated may be used to estimate with 
Comparative speed and accuracy any desired percentile value. It 
may also be used to compare distributions of the same function 
Jàving different interval values. Such a curve as the “ogive” plotted 
їп Fig, 19 will tell us quickly how far in the standing broad jump 
“5 per cent or 35 per cent or any other per cent of the students can 
Jump. The line 4X is the perpendicular let fall from the highest 
cumulative frequency. On this we find the point 75 per cent of the 

istance from the base line OX; from this “75 per cent" point we 
Tun a horizontal line to cut the curve, and then drop another line 
Perpendicularly to the line O X. Where this latter line cuts the Ox 


246 The Status of Measurement in Physical Education 


the weakness of all attitude scales. That is, does the student answer 
the way he really feels, or the way he thinks he is expected by higher 
authority to feel? The careful construction and naturalness of the 
subject matter of the test, however, tends to keep this limitation 
to a minimum. 

Social Efficiency. The social efficiency objective of physical 
education implies a concept of the individual as a total personality. 
Adjustment in the physical education environment is significant 
not just in itself, but as a ramification of the total behavior patterns 
of the individual. It follows, then, that from the general field of 
measurement of human behavior valuable tools for both guidance 
and research are available to the physical educator. Much work 
has been done in recent years in developing measures of human 
behavior, and a significant number of effective tests and scales for 
measuring personality, attitudes, interests, social adjustment, and 
social status may be found. The student of measurement in physical 
education should keep abreast of advances in this area. 

Special preparation outside that normally given in physical 
education measurement is needed if the student wishes to utilize 
this type measurement tool. These tools must not be used casually 
by the untrained. An examination of some of the outstanding 
books? in this area will reveal its scope, magnitude and importance. 
In view of its complexity and extent, therefore, it does not seem 
wise to attempt to summarize this special field in this text. It is 
none the less important, however, for the student of physical educa- 
tion to be familiar with all available tools to appraise the social 
efficiency objectives of physical education, and knowledge and 


27 Buros, Oscar Krisen, The Third Mental Measurements Yearbook. New Brunswick, 
Rutgers University Press, 1949. Pp. 1047. 
Bingham, Walter Van Dyke, Aptitudes and Aptitude Testing. New York, Harper 
and Brothers, Publishers, 1937. Pp. 390. 
Cattell, Raymond B., Description and Measurement of Personality. Yonkers-on- 
Hudson, New York, The World Book Company, 1946. Pp. 602. 
Greene, Edward B., Measurement of Human Behavior. New York, The Odyssey 
Press, 1941. Pp. 777. 
Hildreth, Gertrude H., 4 Bibliography of Mental Tests and Rating Scales. New 
York, Psychological Corporation, 1959. Also: Bibliography of Mental Tests and 
Rating Scales, 1946 Supplement. New York, Psychological Corporation, 1945. 
Mursell, James L., Psychological Testing. New York, Longmans, Green and 
Company, 1947. Pp. 449. 
Symonds, Percival M., Diagnosing Personality and Conduct. New York, D. 
Appleton-Century Company, 1951. Pp. 602. 


Rating Scales 247 


experience in this type measurement should be sought. A basic need 
of physical education measurement is for cooperative effort among 
sociologists, psychologists and physical educators in order that more 
adequate measurement tools in this important phase of the physical 
education program can be developed. 

Despite the progress in the development of social efficiency 
measurement tools in general education, these still are subject to 
limitations. Even the more widely used personality inventories, 
Social adjustment inventories, and other rating devices lack well- 
established validity and reliability. The more advanced clinical 
Psychological tests require more training than the average classroom 
teacher has. The lack of wholly adequate tools to measure the 
Objectives of social efficiency, combined with present emphasis on 
their importance, has resulted in increasing use by teachers of 
qualitative evaluative procedures, such as check lists, anecdotal 
records, observational records, autobiographies, questionnaires, 
self-appraisals and the like, to supplement the measurement pro- 
&ram. It is beyond the scope of this text, however, to deal at length 
with evaluative techniques other than measurement. The student 
is referred in particular to the references at the end of Chapter I for 


recent source material on the qualitative aspects of the evaluation 
Program. 


Selected References 


Bann, A. S., Burton, WiLLiAM Н. and BnurckwEn, LEO J.: Supervision. New 
York, D. Appleton-Century Company, 1947. Pp. 879. 

„Chapter V, “The Appraisal of the Educational Product” contains an excellent 
résumé of the use of rating scales, types, limitations, and a description of some 
of the more widely used rating scales of attitudes, habits and behavior. 

BrANCHARD, B. Everard: “A Comparative Analysis of Secondary-School Boys" 

and Girls’ Character and Personality Traits in Physical Education Classes," 

es. Quart. Ат. Assoc. for Health, Phys. Educ., and Rec., Vol. 17, No. 1 (March, 
1946), 33-39. d 

1ves results of a two year study using Blanchard's Behavior Rating Scale, 

and indicating some positive results of the effect of physical education on 

Character, 

Bunos, Oscar Krisen: The Third Mental Measurements Yearbook. New Brunswick, 

utgers University Press, 1949. Pp. 1047. 

Contains a complete listing with annotations and reviews of all educational 
Mental tests. See particularly the sections on Character and Personality, 
РР. 51-113 and Health, pp. 475-480. 


248 The Status of Measurement in Physical Education 


CATTELL, RAYMOND B.: Description and Measurement of Personality. Yonkers-on- 
Hudson, New York, The World Book Company, 1946. Pp. 602. 

Surveys, integrates, and interprets research and knowledge in the field of 
personality measurement. See especially Chapter VIII, “Principal Surface 
Traits Discovered Through Behavior Rating,” and Chapter IX, “Principal 
Source Traits Discovered Through Behavior Rating.” 


EsPENSCHADE, ANNA: "Selection of Women Major Students in Physical Educa- 
tion,” Res. Quart. dm. Assoc. for Health, Phys. Educ., and Rec., Vol. 19, No. 2 
(May, 1948), 70-76. 

Reports findings on a study to determine factors which characterize successful 
and less successful major students, and superior and average teachers of physical 
education. Includes a bibliography of related researches. 


Соор, CARTER V., Bann, А. S. and Scares, Douctas E.: The Methodology of 
Educational Research. New York, D. Appleton-Century Company, 1935. 
Pp. 890. 

yr emm theconstruction and use of rating scales as a research technique in- 
cluding rating of courses of study, pp. 697-699; rating of educational institu- 
tions, pp. 471-476; rating of personality, pp. 427-429; and rating of teachers, 
рр. 425-427. 

Greene, EDWARD B.: Measurement of Human Behavior. New York, The Odyssey 
Press, 1941. Pp. 777. 

Chapter XXII, "Standard Deviation or Absolute Scaling", describes mathe- 
matical problems of scale construction; Chapter 25, "The Evaluátion of 
Judgements", considers different types of rating scales, advantages of various 
forms, types of errors, and the validity of ratings. 

Lawsue, C. H., Jr.: Principles of Personnel Testing. New York, McGraw-Hill 
Book Company, Inc., 1948. Pp. 227. 

Includes a section on the weakness of rating scales, differences among raters, 
the halo effect, and ranking methods. Contains a list and related treatment of 
temperament and personality tests, interest tests and other types. 

LUNDBERG, GEORGE A.: Social Research. New York, Longmans, Green and Co., 
1942. Pp. 426. 

Chapter VII, “The Measurement of Attitudes and Opinion”, deals with 
problems of measuring opinions and attitudes; the construction and use of 
rating scales, including the mathematical techniques involved; and the limita- 
tions of rating scales. 

MONROE, WALTER S., Editor: The Encyclopedia of Educational Research, “Rating 
Methods”, pp. 887-891; “Personality”, pp. 785-794. New York, The Macmillan 
Company, 1941. Pp, 1344. 

Includes a complete listing and analysis of significant research and measure- 
ments in the educational field. 

PALMER, IRENE: “Personal Qualities of Women Teachers of Physical Education,” 
Res. Quart. Ат. Phys. Educ. Assoc., Vol. IV, No. 4 (December, 1933), 31-48. 

"This study reports the application of the Bernreuter Personality Inventory to 
recent graduates in physical education and points out significant differences in 
personality between the more successful teachers and those not so successful. 

Remmers, Н. Н. and Gace, N. L.: Educational Measurement and Evaluation. 
New York, Harper and Brothers, 1943. Pp. 580. 

Chapter XVI, “Adjustment”, is concerned with difficulties in evaluating 
adjustment, problems of validation, and available adjustment evaluation tech- 
niques including types of rating devices. 


PART II 
Tools of Measurement: 


A Brief Outline 
of 


Statistical Methods 


CHAPTER XIII 


Elementary Statistical Methods 


The undergraduate student of physical education today is confronted 
with a large mass of reference material which has to do with measure- 
ment. Ifhe is to digest this material with any degree of intelligence, 
it becomes necessary for him to have an understanding of elementary 
Statistical procedures. Chapters XIII and XIV attempt to present 
In very brief form a résumé of what is ordinarily offered in a text on 
educational statistics in two or three hundred pages. As a conse- 
quence, the student working without guidance and the teacher of a 
Professional course must rely on the more detailed explanation to be 
found in texts on statistics. Every one will immediately recognize 
the almost impossible task of attempting to present a course in 
elementary statistical methods in two rather brief chapters, but the 
Scope of this text does not permit full explanation. It is hoped, 
Owever, that the careful student may be able to understand the 
material presented here despite the limitations placed upon its 
Presentation, 
7 In a first course in Physical Education Tests and Measurements, the 
Instructor must choose carefully the material to be presented. This 
Will probably not include more advanced material than is to be 
found in the present chapter and its application to the solution of 
Problems suggested in Chapters XV and XVI, depending, naturally, 
upon the unit value of the course. Occasionally it is possible to 
Present the Pearson product-moment method of correlation. If 
time does not permit this presentation, at least the meaning of 
Correlation should be discussed, because of its importance and wide- 
Spread use in literature dealing with measurement. The instructor 


251 


252 Tools of Measurement 


must plan his presentation very carefully in order to allow as much 
time as possible for practical applications of the statistical tech- 
niques. The text problems should not only be read carefully but 
the actual computations should be made so that facility in the use 
of the various procedures may be gained. 

Only the advanced student who wishes to do experimental work 
in measurement, should attempt to familiarize himself with the 
more advanced procedures outlined in Chapter XIV and Part III. 
The elementary student without a good basic knowledge becomes 
hopelessly involved when attempting problems in these portions 
of the text. 


TABLE VIII 
Recorps or Firreen-year OLD Boys IN THE STANDING BROAD Jump 
Data Shown in Feet and Inches and Collected to the Nearest Inch 


8-8* 6-11 | 6-2 7-2 6-4 7-9. 6:8: | 226 7-10 


6-8 7-6 6-10 7-0 7-0 6-8 7-2 6-4 6-10 


6-1 7-9 5-9 6-6 5-5 7-1] 5-8 VEL 6-5 


8-2 6-9 5-5 6-5 7-6 6-9 6-5 7-2 8-5 


8-6 6-11 | 5-4 7-2 7-1 6-0 6-5 6-5 7-7 


7-10 6-4 7-0 6-11 5-9 5-9 6-7 7-8 7-4 


6-6 6-4 5-2 7-8 6-1 7-5 6-11 6-3 7-4 


6-5 7-6 5-0** | 6-2 6-7 5-5 6-1 7-3 7-3 


7-3 8-1 7-0 6-5 7-6 6-0 7-0 5-6 6-5 


The asterisk (*) marks the best performance to be found and the double 
asterisk (**) marks the poorest performance. To obtain the range in this particular 
set of data subtract 5-0 from 8-8, thus giving the range as 3-8, that is, 3 feet 
8 inches, or 44 inches. 


I. The Frequency Distribution. Test data collected in the 
field are little more than a series of numbers or figures and have no 
meaning until they have been assembled in some logical way. In 
statistics this logical arrangement is called a frequency distribution — 
a grouping of all the scores made into various intervals or classes. 
In order to determine the size of these intervals or classes, it i$ 


Elementary Statistical Methods 253 


necessary to inspect the scores made to note the range in perform- 
ance. The range is obtained by subtracting the poorest performance 
from the best performance. 

Selecting the Size of Each Interval. It is a rather common 

Practice among statisticians to select the size of the interval in such 
a way that the number of intervals will be not less than 10 or more 
than 20. When more than 20 intervals are used, the data become 
rather cumbersome to handle and when less than 10 intervals are 
used the data may be warped somewhat by such a grouping. This 
1s sometimes referred to as grouping too coarsely. 
: In the problem at hand, (see Table УШ), with a range of 44 
Inches, а 3-inch interval will yield approximately 15 intervals 
(44/3) and will therefore be appropriate for use. If the range were 
70 inches, a 5-inch interval would be satisfactory. In general, when 
the range divided by a selected size of interval is found to be between 
10 and 20, preferably around 15, that size of interval will be 
Satisfactory. 

In collecting physical education data we are most often concerned 
with measurements to the nearest foot, nearest inch, nearest tenth- 
_ Second, ete., and it is therefore advisable whenever practicable, to 
use an odd number of units in the interval so that the midpoint of 
the interyal will represent an actual number collected in the field. 

is point will be discussed more fully when midpoints of intervals 
are considered, 
he next question which should arise is in regard to the actual 
Selection of the intervals. In other words, where shall the first 
interval start and end, and how shall the set of intervals be listed 
on the tabulation sheet. It is customary to list the intervals on the 
sheet in such a way that the interval containing the best perform- 
ance will be placed at the top of the column. The first interval, then, 
must be one which contains the best performance made, namely, 
8-8, Shall the interval begin with 8-8, end with 8-8 or shall 8-8 be 
€ middle one of the three numbers? 
€re is no set rule about procedure in this regard, but some 
4. Stical workers have taken the position that we should start 
Tom a theoretical zero in setting up intervals. Undoubtedly this 
method is used to eliminate from the distribution any influence 
Which the actual data might exert upon the statistician. In starting 
Tom a theoretical zero, the student need only remember that the 
“sinning number of the interval should be divisible by the size of 
10 


Stati 


254 Tools of Measurement 


TABLE IX 


Frequency DISTRIBUTION OF THE Data ОЕ TABLE VIII 
Data Collected to the Nearest Inch 


Interval Tally Frequency 
8/6"-8/8" 11 2 
8/5"-8'5" 1 1 
8/0"-8/2" ll 2 
7'9"-711" TH. 5 
7'6"-7'8" тн. 111 8 
7'8"-7'5" IrH.1 6 
7'0"=7'2" THA THY 1 11 
6/9"-611" ты. 111 8 
6'6"-6/8" тїн. 11 7 
6/5"-6'5" THT 111 13 
6/0"-6'2" THAI 6 
5/9"-511" 1 5 
5'6"-5'8" 11 2 
5'37—5'5” ETT 4 
5!0"-5/2" 111 5 

N equals 81 


the interval. Thus in our problem, the beginning of the top interval 
should be divisible by 3 and the interval should contain 8 feet 8 
inches. The beginning must therefore be 8 feet 6 inches and the 
interval listed as 8-6 to 8-8. To secure the second interval down the 
column, subtract 3 from both of these figures and continue with 
this process. 

It should be noted that the designations of an interval are 
inclusive, that is, the interval 5/0"-5/2" signifies that the perform- 
ances of 5'0”, 5/1", and 5/2” are included in that interval. 


Elementary Statistical Methods 255 


k In making a frequency distribution of the data of Table VIII, 
simply tally each performance in its proper interval as shown in 
Table IX. Then transfer the total tallies to the frequency column 
and add this column to get the total number of frequencies, desig- 
nated in statistics by N. 

The data of Tables X and XI are offered to familiarize the student 

with the method of making a frequency distribution from various 
types of material presented as original (or raw) scores. 
У Midpoint. When scores or performances have been grouped 
into frequency distributions as in the data of Tables IX, X, and XI, 
the frequencies in any interval lose their identity in so far as indi- 
vidual performances are concerned. In calculating various statistical 
measures, it is therefore necessary to assign one value which best 
represents all the performances in a given interval. Statisticians 
have agreed that the value which best represents these collected 
Performances will normally be the midpoint, since the assumption 
is made that the performances are distributed evenly throughout 
the interval. 

Students working with physical education performance data 
Collected in the field must be extremely careful as to how data are 
collected. It is obviously a waste of time and energy to collect data 
in the field to the nearest 1% inch if these data are to be placed in a 


TABLE X 


Recorps or FounrEEN-YEAR OLD Boys iN THE BASKETBALL THROW 
ron DısTANCE, Data COLLECTED To THE NEAREST INCH 


A. Original Scores in Feet and Inches 


= 
47-5 57-10 53-11 45-1 49-5 
55-2 51-2 51-1 54-7 52-1 
51-11 49-1 52-4 62-5 52-8 
49-11 54-10 55-5 51-7 53-0 
50-2 46-7 ~ 59-10 49-4 55-11 
47-10 44-8 56-9 45-10 53-3 
48-2 *40-1 50-7 58-6 56-5 
44-9 44-4 42-0 59-4 52-11 

„л 58-2 61-1 55-0 55-8 
64-2 54-5 57-5 51-10 50-4 


* * 
hi Maximum and minimum scores for range. Range equals 24’-1” or from 64-2” 


Bhest score subtract 40/-1" the lowest. 


256 Tools of Measurement 


Tue Same Scores GROUPED iN A Frequency DISTRIBUTION TO SHOW EFFECT 
oF Size oF INTERVAL 


With intervals of 18 inches With intervals of 2 feet 
Scores in feet | Tabula- Fre- Scores in feet | Tabula- Fre- 

and inches tion quency and inches tion quency 
63:6-64:11 1 1 63:0-64:11 1 1 
62:0-63:5 1 1 61:0-62:11 11 2 
60:6-61:11 1 1 59:0-60:11 11 2 
59:0-60:5 11 2 57:0-58:11 1111 4 
57:6-58:11 lll 3 55:0-56:11 HHI 6 
56:0-57:5 111 5 55:0-54:11 THAI 7 
54:6-55:11 THAO 6 51:0-52:11 +541111 9 
53:0-54:5 THd- 5 49:0-50:11 HHI1 7 
§1:6-52:11 | THHlI 7 47:0-48:11 111 5 
50:0-51:5 HH- 5 45:0-46:11 111 5 
48:6-49:11 1111 4 45:0-44:11 1111 i 4 
47:0-48:5 111 5 41:0-42:11 1 1 
45:6-46:11 11 2 39:0-40:11 1 1 
44:0-45:5 1111 4 N 50 
42:6-43:11 1 1 а 
41:0-42:5 1 1 
59:6-40:11 1 1 

N=50 


“N” is the symbol used for the total number of measures or scores taken- 
"f" is the symbol used for the frequency of scores made in any one class. 

^ is the Greek capital letter sigma and stands for the "sum of" i.e, Bf = М 
or the sum of all the frequencies of the classes equals JV. 


frequency distribution where the intervals are 3 or 5 inches in size. 
In such instances performances in the field may be noted as taken 
to the nearest inch. Students, however, must stop to consider what 
a performance of 5 feet 6 inches means when data are collected to 
the nearest inch. The common procedure is to class as 5 feet 6 inches 
any performance ranging between 5 feet 514 inches and 5 feet 6% 
inches (not including the exact performance of 5 feet 614 inches). 
In situations where data are collected to the nearest inch and we 
have an interval listed as 5-6 to 5-8, this interval may be thought 
of theoretically as 5-5.5 to 5-8.49. This explanation is necessary 
to avoid confusion in securing the midpoint, since the midpoint 17 


Elementary Statistical Methods 257 


TABLE XI 
50 vano Das RECORDS — COLLEGE WOMEN 
Original Scores — Seconds and Fifths 


*(F) | 8-4 7-4 7-4 7-2 7-1 8-1 6-5 7-4 
7-1 8-0 6-4 7-4 8-4 7-1 6-3 8-1 7-2 
7-4 8-0 7-2 8-2 7-2 8-0 7-0 8-2 8-5 
7—5 7-2 7-4 6-4 7-4 (5) 7-4 7-4 7-1 
Zl 8-1 8-1 7-3 8-2 7-0 7-3 7-2 6-2 
[4 8-1 7-0 8-1 7-4 7-1 7-4 7-2 7-0 
ded 9-2 7-2 8-0 8-0 8-0 7-0 8-0 7-0 
8-1 | 82 | во | 74 | 80 | 62 | 74 | 70 | 89 
7-8 7-4 7-4 7-4 7-8 7-2 7-0 7-3 
7-4 7-1 7-2 7-0 6-4 7-4 6-4 7-0 
8-2 8-0 8-0 7-2 9-0 6-4 6-5 6-4 
*Е represents Freshmen and S Sophomores. 
Same Scores GROUPED IN A Frequency DISTRIBUTION 
Class Interval Chosen at One-Fifth of Second 
ane es ids Tabulation Frequency (f) 
6-2 11 2 
6-5 111 5 
6-4 341 6 
7-0 HAHA 10 
7-1 113411 7 
7-2 HHHH 11 
7-5 13341 6 
7-4 THOIHHABEHHOH]HA 20 
8-0 THATHAJI 12 
8-1 155111 8 
8-2 1111 4 
8-5 1 T 
8-4 11 2 
9-0 1 i 
9-1 0 
9-2 1 E 
L N94 


found by adding half the size of the interval to the lower limit. To 
illustrate the point, suppose half the size of the interval (114) is 
added to the lower limit (5—5.5). We then obtain a midpoint of 5-7. 
£ however, no thought is given as to how data are collected, stu- 
dents may consider that, in an interval listed as 5-6 to 5-8, half the 


258 Tools of Measurement 


size of the interval (114) should be added to the 5-6, making the 
midpoint appear as 5-7.5 

The same theory holds true in other types of units. In a baseball 
throw for distance, interval 70-74, with data collected to the nearest 
foot, the midpoint is found by adding 2.5 (half the size) to the bottom 
of the interval (69.5), the result being 72. In an event involving 
time, such as a dash, it should be assumed that the figure read on the 
watch, as for example, 12.4, represents in itself a midpoint. There- 
fore, when any number of figures is collected in an interval, the 
designation of “time collected to the nearest one-tenth second" 
should be placed on the distribution. The interval representing 
12.4 will then theoretically be 12.35-12.44, and an interval shown 
as 12.3 to 12.5 will have a midpoint of 12.4 (12.25+0.15). 

In such events as the pull-up or chin, the dip on the parallel bars, 
the high jump and the bar vault, students must recognize the con- 
ditions under which data are collected. They are not collected to 
the nearest something by the very nature of the event. In order to 
be given credit for a performance of three pull-ups, the performer 
must actually do three pull-ups. In a high jump or bar vault of 
4 feet 6 inches, that exact height (no less) must be cleared. The low 
point of the interval in such instances will be the actual performance 
listed. Several examples will serve to illustrate the procedure. 


Event Interval Midpoint 
Pull-up 5-7 5.0 + L5 = 6.5 
High jump 4-3 to 4-4 4-3.0 + 1 = 4-4.0 
Bar vault 4-6 to 4-11 4-6.0 +3 = 4-9.0 


In still another type of data common in physical education we 
have what is known as discrete or discontinuous measures. These 
are measures which cannot be subdivided, such as basketball goal 
throws or target hits on a baseball target. It is impossible to make 
4V$ goal throws since a performer either makes 4 or 5. In such 
series, a performance, such as 4, represents the midpoint of the 
interval and also its lower and upper limits. This does not mean, 
however, that the average goal throw of a group cannot be expressed 
in the form of a decimal, such as 4.77. 


Elementary Statistical Methods 259 


The Normal Probability Curve. When distributions of large 
numbers of cases of most physical traits are plotted they tend to 
approach the form of the normal probability curve represented in 
Fig. 8. In reality very few distributions are absolutely symmetrical 
such as the one indicated in Fig. 8, but if the distribution tends to 


кд 


-3зт -ж -ie Мет жт 420 Br 


Fig. 8. A theoretical distribution showing the form of the normal probability 
curve, with the mean at the highest point of the curve. 


v 


Frequency 


Sw WA ымм 
N 
ES 
ү Де 


4o 35 4 45 E کک‎ EJ “ ся Scali Ket, 


Fig. 9. Showing an approximation to a normal curve. The dotted lines show 
the curve of the original date smoothed by inspection. 


follow that type of curve in a general way, we say that it is normal 
апд hence one which has been governed by the laws of chance. 

18. 9 represents a curve plotted from a sampling of records of 
fourteen-year old boys in the basketball throw for distance. The 
actual data necessary for plotting such a curve are as follows: 


260 э Tools of Measurement 


Data COLLECTED TO THE NEAREST Foor 


Scores in feet Frequency Scores in feet | Frequency 
63-65 1 48-50 8 
60-62 3 45-47 6 
57-59 6 42-44 3 
54-56 9 39-41 1 
51-53 13 N 50 


In statistical procedures, the properties of the normal probability 
curve or normal probability surface, as it is sometimes called, are 
very important and can be treated mathematically. By the use 
of these properties statisticians are able to compute the percentage 
of individuals who may be expected to attain a certain level of per- 
formance, they may compute the chances that an individual's 
performance will lie between certain given limits, they may compute 
the relative difficulty of a large number of stunts, etc. Further, the 
properties of the normal probability surface are of special value 
in scale construction. ee 

Standard texts on statistical methods should be consulted for a 
detailed description of the curve and its properties, the computation 
of chances and the like. Some discussion will be found in Chapter 
XV. 

The results of a large number of cases in almost every conceivable 
sort of a physical test which can be objectively measured will 
approach the normal curve if the “measuring stick” has been refined 
to meet the performance of a given age group. For example, in 20- 
foot rope climb for high school boys, if time is recorded to the nearest 
second, there will be a large number of the boys who cannot reach 
the 20-foot mark even with the aid of their feet. In а situation of 
this sort the data will be bunched at one end of the curve with the 
individuals who have been unable to climb to the top getting zero. 
It is important, therefore, to adjust performance at the zero end of 
the scale by recording individuals who made, let us say, 18, 15, 12 
and 9 feet. Such a procedure spreads out the “zero” boys into four 
intervals instead of one and offers a possibility of grouping the data 
correctly. 

Another example may also be of some assistance. Suppose that 
in a baseball throw for accuracy test the target is constructed to 
represent a "strike" area. The rectangle, we will assume, is painted 


Elementary Statistical Methods 261 


on a handball wall and is 17 inches wide, 30 inches in height and 20 
inches off the ground. Ten throws are allowed at a given distance 
and the number of hits out of 10 trials scored. The distribution 
might look something like this: 


‘Number of hits out of 10 trials U) Frequency 


10 0 


онат оо ч оо 
со 


It will be noted that 90 per cent of the frequencies are grouped in 
five intervals. Undoubtedly we should be able to get a wider spread 
of ability than this and careful thought should be given to the re- 
Construction of such a test. 

Two suggestions are offered for procedure when frequency distri- 

ution is not normal. 

l. Allow 20 trials instead of 10 and thus get a much wider range 
of ability. 

2. Construct a concentric-circle target with the largest circle 
5 feet in diameter and circles inside of this of varying even foot di- 
ameters from 1 foot up. Counting a hit in the 1-foot circle as five 
Points and reducing hits in each circle one point, it will be possible 
to get a very good spread of ability in only 10 throws. There is a 
Possibility of five times as much spread of ability as in the distri- 

ution shown above. Moreover, it would seem that accuracy in 
rowing is more definitely measured than in the case of a rectangu- 
ar target when a hit is counted anywhere in the area. 

In short, it is essential to determine performance accurately at 
€very level of ability and see to it that the spread of ability is great 
Enough to prevent bunching in a few intervals. 


262 Tools of Measurement 


II. Measures of Central Tendency. After scores have been 
tabulated and the frequency distribution arranged, it will be de- 
sirable to find a single score or measure which may be taken as 
representative of all the scores made by the group. Though there 
are three such measures in common use, namely, the mean or com- 
mon average, the median and the mode, only the first two need 
receive attention in our discussion here. 

The Mean. For ungrouped data, that is, data that have not been 
put into frequency a distribution, the mean may be readily found by 
summing the scores and dividing by N, the number of cases. This 
is a somewhat tedious process when dealing with a large number of 
cases. For grouped series the steps are as follows: 

1. Tabulate the frequency distribution as previously indicated, 
high numbers at the top and low numbers at the bottom. These 


TABLE XII 
Calculation of the Mean! 


Scores in ft. pie if d. fa. 
59.5-62.4| 61.0 | 25 6 138 Assumed Mean (A. M.) = 43.0 
56.5-59.4| 58.0 25 5 125 
53.5-56.4| 55.0 | 47 4 188 Zfd = + 167 
50.5-53.4| 52.0 60 $ 180 
47.5-50.4| 49.0 | 124 2| 248 N = 900 
44.5-47.4| 46.0 | 152 1 152 1051 Size of interval = 3 


A = mean 


* 
Кы 
[ui 
m 
BR 
P 
> 
> 
a 
o 
m 
jen 
[en 


ЖИ = ALL ie Xê of interval) 

38.5-41.4| 40.0 | 121 | —1 |-121 
55.5-58.4 37.0 | 92 | —2 | 184 
32.5-55.4 34.0 | 67 | —3 |—201 3 167? cote Ni ыле 
29.5-82.4 31.0 | 45 | —4 |—180 paro s i 25 s) Stee 
26.5-29.4| 28.0 | 20 | —5 |—100 
25.5-26.4 25.0 | 15| —6 | 78—864 

N = 900 X167 


*Abitrary origin chosen. Assumed mean = midpoint of the interval 41.5 — 44.4 


1These data were very kindly supplied by Misses Vinnie Gee, Clarinne Llewellyn, 
Ada Brown and Grace Thomas of Long Beach, Cal., and represent performances 
of Long Beach High School Girls in the Basketball Throw for Distance. Scores 


were recorded to the nearest foot. 


Elementary Statistical Methods 263 


columns are represented in the illustration, Table XII, by scores 
and (f) frequency. 

2. Arbitrarily select some interval near the middle of the distri- 
bution as the “assumed mean.” See starred (*) interval. This 
interval may be represented by its midpoint which according to 
definition is the lower limit of the class + one half the size of the 
interval, i. e. 41.54-1.5 =45 from data in Table XII. 

3. The “d” column represents deviation in intervals both ways 
from the assumed mean. Note that those intervals below the mean 
are minus or negative deviations and those above plus or positive. 

4. Multiply the frequency of each interval by its deviation giving 


, the ‘ fd” column. 


5. Add separately the plus /d's and the minus /'s and find the 
algebraic sum of the two, +167. If this is plus, the mean has been 
assumed as too low and some quantity must be added to the assumed 
mean to get the real mean. If the algebraic sum (2/d) is minus, the 
mean has been assumed as too high and some quantity must be 
Subtracted from the assumed mean. 

6. The quantity to be added or subtracted will be found by 
dividing Xfd by N (the number of cases) and multiplying this by 
the size of the interval. 


z4 X size of interval = 157 X 3 = .557 or :56 


7. Since a class interval can be represented by its midpoint, the 
mean will then be 4340.56 or 43.56 feet. 
I : К ®@-—: m : 
n case Z/d is negative, the quantity ( N X size of interval) must be 
Subtracted from the assumed mean. 
Formula for Mean: 


braic 2/4 
MEAN = Assumed mean ЕЯ em ш 


X Step interval) 


The Median. The median is a second measure of central tendency 
used quite frequently in educational statistics. It may be defined 
as that point on the scale so located that 50 per cent of the measures 


fall above it and 50 per cent fall below it. In other words, it is the 


Measure or score above or below which lie an equal number ot 
Measures or scores. Suppose we have the following set of 15 scores 


264 Tools of Measurement 


of college men in the 100-yard dash arranged in numerical order, 
10%, 10%, 11%, 11%, 11%, 11%, (12%), 12%, 124, 12%, 13%, 15%, 13%. It 
we wish to pick out one score which is representative of the entire 
group of measures, we may designate the midscore or midmeasure 
as the median. Since there are 15 scores, the seventh score counting 
from top to bottom will have six lying on each side of it and may be 
used to represent the median. If the number of cases is even instead 
of odd we must pick out a point midway between the two central 
cases. Suppose the score 15% is omitted. The two central scores 
then become 11% (counting from the top) and 12 (counting from the 
bottom). А point lying exactly between the two will then be used 
to designate the median, that is, 11% or 11.9. In a simple series such 
as has been shown this central tendency is not truly the median but 
the midscore. 

If the formula /V/2 is used to find the median in a single series 
where JV represents the number of cases or scores, some difficulty 
may be experienced until the student gets the idea that the median 
lies at a point which has N/2 scores on each side of it. In a simple 
series (that is, a series not in a frequency distribution), the student 


should use the formula BE = Bu or 7. Count then from either 

N41 
end to obtain the seventh score. When JV is even or 12, SUMI 
124-1 


2 =6.5. Count then from either end to the sixth and obtain the 


“sixth and a half case” or a score half way between the two sixth 
cases counting from top and bottom. With grouped data, that is, 
a frequency distribution, the formula for the median is №/2. 


Scores* Midpoint Frequencies 
41-43 42.5 1 
38-40 39.5 $ 
$5-37 $6.5 6 
32-34 $3.5 12 
29-31 30.5 9 
26-28 27.5 d 
25-25 24.5 3 
20-22 21.5 1 

N = 42 


*These scores аге arranged in intervals of three, 20-22 meaning 20.00-22.99, 
23-25 representing 23.00-25.99, etc. 


Elementary Statistical Methods 265 


In grouped data the principle is the same, but the procedure is 
somewhat more complicated. An essential concept in treating 
grouped data is that the frequencies in each class must be considered 
as evenly distributed between the class limits, no matter how few 
cases the frequency represents, Let us find the median of the fre- 
quency distribution shown at the bottom of page 264. 

The median equals the point where 50 per cent of scores are above 
and 50 per cent below or the JV/2th score or 21st score. To find this 
21st score we add up from the bottom in the frequency column the 
1, 5, 7, 9 which gives us 20 and so 21 belongs in the class represented 
by the midpoint of 33.5. The interval 32-34 is here taken to begin 
at 32 and end at 34.99. How much must we add to the lower limit 
of this class to give a proper value to the 21st score? Since the fre- 


Classes and | Frequencies 
Intervals in Class 


| a eee 20th score 


31 


ШШШ 


266 Tools of Measurement 


quency in each class is evenly distributed theoretically, the class 
intervals above and below midpoint 33.5 can be represented as 
shown at the bottom of page 265. 

The value of the 21st score then equals 32 (the lower limit of the 
class in which case 21 is found) plus М» of the distance through 
the interval, and, inasmuch as there are three units in each inter- 
val, the amount to be added to 32 is (fs X 3) or .25. Therefore the 
median is 32.25. 

Example taken from Table XII. Problem: to find the 450th 


score and its corresponding value. 


44.5 
44.4 T 
-—450th score equals 
S 45.5 H 43.986, the median 
tep | —————————— H 
xn 111 
3 Units Scores evenly 
42.5 distributed 
between 
41.5 41.5 and 44.5 
—— at 
Median = the score at the 450th case __ 
= the 358th score + Ша imn) X step interval | 
92 
= 41.5 + Gr x5 
= 41.5 + 2.486 
= 45.986 


Referring to Table XII, W/2=450. Starting from the bottom of 
the series we find that the 450th case is to be found in the interval 
41.5 — 44.4. Adding the frequencies beginning at the bottom, the 
number of cases up to this interval is 358. The interval contains 
111 cases scattered evenly throughout the interval, theoretically 
n2) _ 92 

ш у ш 
the interval to reach the 450th case. The size of the intervalis three 
and this must be taken into account in computing the number of 
feet we go from the lower limit of the interval. The fraction 9% is 
multiplied by three and the resulting figure added to the lower limit 
which is 41.5. The median then equals: 

Ma = 41.5 + (Hn X 3) = 41.5 + 2.486 = 45.986 


at least. Hence, we must go of the way through 


Elementary Statistical Methods 267 


As a check on the obtained median, we may start from the top ot 
the series. Counting down through the interval just above that in 
which the 450th case is to be found, we have 451 cases. We must 

450 — 451 19 В 
therefore go “чт ha of the way through the interval. The 
lower limit of the preceding interval is 44.5, which may be taken as 
representing the theoretical upper limit of the interval containing 
the median. 
My = 44.5 — (%һ X 3) = 44.5 — .514 = 43.986 

When the distribution has an odd number of cases it will be 
necessary to compute to a half case. Thus if there were 112 cases in 
the interval, 41.5 — 44.4, making V/2 = 450.5, our procedure 
would be as follows: 

My = 41.5 + ec x з) = 41.5 + 2.478 = 43.978 
and 


112 

Percentiles. The median is, of course, the 50th percentile, that is 
50 per cent of the cases lie above the 50th percentile and 50 per cent 
lie below. Any percentile may readily be found in the same way in 
which the median was computed. Referring to Table XII, the 10th 
Percentile will be found at the 90th case from the bottom, the 20th 
Percentile at the 180th case and so on. 

Computing the 10th percentile score we have 


Ma = 44.5 — (22 х s) — 44.5 — .522 — 45.978 


— 78 چ‎ = 
10th percentile = Pio = 32.5 + (2 67 x з) = 33.04 


Computing the 90th percentile (10th percentile from the top), we 
ave 


О н 
90th percentile = Py = 66.5 = (= x s) = 53.82 


The Mode. The mode is simply that interval in which the largest 
number of frequencies occur. Thus, in the data of Table XII, the 
mode will be the interval 44.5-47.4 or more specifically, the mid- 
Point of that interval, which may be represented as 46. 

III. Measures of Variability. There are several ways ot 
*3pressing the spread or the tendency of scores to scatter. This is 
Quite the opposite of the measures of central tendency such as the 


268 Tools of Measurement 


average, median and mode, which attempt to express the whole 
range of measures by some one figure. The measures of variability, 
Range and Standard Deviation, are expressions of the kind of 
scattering the scores have made away from the mean or average. 

In order to compare two or more means or averages properly, it 
is necessary to know how widely scattered the test scores are, that 
is, how widely these test scores differ from the mean. Two distri- 
butions may have the same mean, the same number of cases and the 
same range, but yet in one the scores may be clustered around the 
average, while in the other they are spread out over the range. An 
example is given in Table XIII to show how a frequency distribution 
with the same average or mean may represent quite a different sort 
of variability. Note the even distribution of scores in one case and 
the bunched data in the other. 


TABLE XIII 
SHOWING DIFFERENCES IN VARIABILITY 


Scores* A Ja 


90-99 2 5 


80-89 7 10 


70-79 11 11 


— Mean or Average 
50-59 15 11 is 60.11 


N 73 73 


*Data collected to the nearest foot. 


The Range. For very rough comparison the range of scores may 
be used, that is, low and high scores indicated. Thus, in the data of 
Table XII, the range will be roughly 24-62. 

The Standard Deviation. Statisticians have found that the stand- 
ard deviation is the most reliable of all measures of variability and 


Elementary Statistical Methods 269 


for that reason more emphasis will be placed upon it than any other 
mentioned here. In a normal or fairly normal distribution the 
standard deviation, when measured off above and below the mean, 
will designate the limits of the middle two thirds of the distribution 
(really 68.26 per cent). The standard deviation is usually denoted 
by the Greek letter sigma (c), and may be defined as the square root 
of the mean of the squared deviations taken from the mean of the 
distribution. To follow out this definition literally with a distri- 
bution containing a large number of cases, such as in Table XII, 
would be a tedious method and for grouped series a short method 
has been devised which makes the calculation of с fairly simple. 
Table XIV will serve to illustrate the calculation by the short 
method. 
TABLE XIV 


Tue CALCULATION OF THE STANDARD DEVIATION 
60 Yard Dash — Junior High School Girls* 


Time in ; 

seconds pa S| s ete 
and tenths рош 
AG ERE 

53.0-135.2 0 486 „ 
rS a de 100 af = 10.1 — (9 © з) 
126 12.6] 0} 8 = 9.61 

2.5 21. 4862. 
11.8-12.0 ns ; 7 18| 108 The quantity зоо will be 
ell? | ie] 6 в) 30) 160 

211.4 5 2 z ‹ 
10.9—11.1 Hs 2 E 12 si za and designated as 
10.6-10.8 10.7 | 20 2 40| 80 c, the correction 
10.3-10.5 | 10.4 | 20 | 1| 20| 20 
10.0-10,2 EC 

кашы es 0 157 = He — c X size of interval 
9.7- 9.9 9.8 | 3 = 
: 30 | —1 —30 30 

a 9.6| 9.5 | 59 | —2 | —118| 256 

g1 9-3 | 9.2 30 | —3 | —90) 270 Es dá. 5 
8.8-9.0 | 8.9 | 65 | —4 | —260| 1040 o = 4/298? 9.65 x 5 
$5-8.7| 8.6|19| —5| —95| 475 299 

72-8.4| 8.5| 6| —6] —36| 216 

9-8.31| 80 2| —7| -14| 98 = .807 

N = 299 Z643 2952= хуа. 
187 
хуа = —486 


"These data were very kindly supplied by Miss Eleanor Liniger of Santa 


Monica, California. Data were gathered to the nearest tenth second. 


270 Tools of Measurement 


The steps in the calculation of the standard deviation are as 
follows: j | я Е 

1. Calculate the fd column as previously described in the dis- 
cussion of the mean. 

2. Add another column (fd?) which represents the product of the 
d's times the corresponding fd. Sum this column since all the signs 
are positive. This will be £42. Remember that this is the algebraic 
sum of the /2?. 

+ Z 2 

5. Compute a correction (85) 

N 
zd? 


N because of the fact that the assumed mean has not been taken 


This is to be subtracted from 


exactly at the mean. 
4. Substitute values obtained in the equation, 


E ( y 
т = Vee — 2/4 X size of interval. 


€ of the interval is expressed in 


Thus, in the problem at hand 
ot whole numbers. 


in this way becomes à valuable 


of variability for the majority of 
конагы 68 per cent also for an 


distribution y distribution at all resembling a normal 


Elementary Statistical Methods 271 


cases (68 per cent +). It is possible to compare two sets of data 
on two different experiments with a more accurate result by com- 
paring the limits established above and below the mean. For 
instance in Table XV which shows two different frequency distri- 
butions with the same mean we find the majority of records made 
in Л lie between the limits of 60.11 + lo (16.4) or between 76.51 


TABLE XV 


STANDARD DEVIATIONS ON 2 Sets or Scores HAVING THE SAME RANGE 
AND THE SAME MEAN 


— 
Score | f. 4. | fd. | fd. ll fe d. fd. | fd? 
90-99 2 4 8 | 32 5 4 20 | 80 
80-89 7 3 21 | 65 10 3 30 | 90 
70-79 | 1I 2 22| 44 11 2 22| 44 
60-69 | 20 1 20| 20 14 I 14| 14 
A.M. | 50-59 | 15 0 |-47 11 0 | +86 
40-49 9 | 21 =9| 9 6| -1| -6| 6 
30-39 6 | =2 ا‎ 24 9 | —2 | —18 | 36 
20-29 $ د‎ | =| 27 7 | =5 | 21| 63 
| 
N= 75 —30 219N = 75 —45 555 
®/4=41 Xjd=41 
Correction с = 2/4 б 
N ———— 
— 4/388 
= es g= 75 — 5147 X 10 
= .561 = V456 — .5147 X 10 
= ,/ dha? 
3 y 39 — c? X size of interval = A/ $2483 X 10 
XB _ /sfa\?. 
> FE - (Exo = 2.06 X 10 = 20.6 
TAG, 
"78 — .9147 X 10 о = 16.4 
УБ. = IAT 560 
= VÀ.6853 x 10 вз = 20.6 


1.64 X 10 = 16.4 


and 43.71; while in / the limits 10 (20.6) above and below the mean 
establish the limits of 68 per cent of the cases at 80.71 and 59.51. 

The Probable Error (P.E.). When the curve representing the 
Tequency distribution is normal, the "probable error" is often 


272 Tools of Measurement 


used as a measure of variability and is expressed by the formula 
P.E. = .6745с. In common usage the mean plus and minus one 
P.E. establishes limits including 50 per cent of the cases. This is 
not exactly true unless the distribution is perfectly symmetrical 
and for that reason P.E. should be used with caution as a measure 
of variability. In as much as only 50 per cent of cases are repre- 
sented, the inferences drawn can not be so accurate as in the standard 
deviation where 68 per cent of the cases are considered. 

Quartile Deviation (Q). The quartile deviation may be defined as 
half the range between the 25th and 75th percentiles. In other 
words, it is equal to half of the middle 50 per cent of the distribution. 

Obtain the 25th and 75th percentiles (0, and Q;), subtract Qi 
from Q5 and divide by 2. 


- 0-0 
RT 


This measure of dispersion is most generally used with the median 
in dealing with a measure of central tendency and a measure of dis- 
persion, but as stated under P.E. the Quartile Deviation is not con- 
sidered very accurate. 

A number of other measures of variability are used but those listed ` 
above are the most common. 

IV. The Reliability of Various Measures. Checking Results. 
The results obtained from calculating the mean or the standard 
deviation do not give us an absolutely accurate value true in all 
cases but are simply the obtained figures from a limited experiment. 
It is of interest to have some method of checking the reliability of 
such statistical measures. Formulas have been worked out that 
give us the limits within which we may consider the values as signif- 
icant. These limits are expressed in two ways, first we state the 
percentage of cases in which our answer may be correct and second 
we establish an increment which, if added to or subtracted from the 
Row results, sets the limits within which the answer is probably 

rue. j 

Sampling. It must constantly be bor: 

data secured in distributions such 


XII and XIV are but sam 


ne in mind that the objective 
as have been given in Tables XI, 


1 ples of an infinite number of distributions 
which might be secured if time permitted. However, even with 


fallible data it is possible to show within what limits a true measure 
may be expected to fall. “The principle behind the sampling process 
is that a fairly large number of items chosen at random from a large 


Elementary Statistical Methods 273 


group or population is very likely to have the characteristics of the 
whole population.” 4 

Standard Error of the Mean. Students should not regard the 
obtained mean of a distribution as the true mean of the entire group 
considered, but only as a mean of a sampling in that group. Means 
of samples used to represent all of the class of which the sample is 
but one, are likely to be inaccurate due to the fact that the group 
measured is but a fallible sample. The probable extent of such 
inaccuracy may be estimated by formulas which are quite simple 
to use. 

The formula for the standard error of the mean is: 


c 
ou = AN 
Taking data from Table XIV we have Jf = 9.61, с = .807 and 
N = 299. Substituting in the formula: 


.807 
см = 07299 = .0467 


The faith we can place in the mean of group measurement may 
now be shown. In 68 per cent of the cases the mean of the whole 
Class represented by the group actually measured, will not vary 
from 9.61 more than + 1см. In other words, in 68 per cent of the 
Cases the true mean will lie between 9.56 and 9.66 seconds. It will 

€ immediately noted that the larger the number of cases the less will 
€ the су and the greater will be the accuracy of obtained scores. 

The Probable Error of the Mean. This is found by multiplying 
the standard error of the mean by .6745 or Р.Е.м = .0515. Р.Е.м 
1$ interpreted as follows: the chances are 50-50 that the true mean 
lies between 9.58 and 9.64. 

Lhe Standard Error of с. How much faith can we put in the 
Obtained standard deviation. Using data from Table XIV, if the 

IStribution is approximately normal we may use the following 
formula: 
o 80r 
Os = JN 598 = .033 

In 68 cases out of 100, the true standard deviation of the distri- 
bution lies between .774 and .840 seconds. 

"Holzinger, K. J., Statistical Methods for Students їп Education, p. 16. Boston 
Inn and Company, 1928. 


274 Tools of Measurement 


The Reliability of a Difference between the Means of Two Comparable 
Series. Referring to Table XI suppose we wish to find what differ- 
ence exists between the mean scores made by freshmen and sopho- 
mores and how reliable that difference is. 


Freshmen Sophomores 
Mı = 7.793 Me = 7.362 
сі = .526 со = .574 
om = .06975 ome = .0943 


Subscripts 1 and 2 are used to distinguish between the two groups 
The formula for calculating the standard error of a difference is 


бд? = A d ii Tous 
= V (.06975)?+ (.0943)? = .1172 


The actual difference between the means of the two groups will 
be designated by d, hence, d = 7.793 — 7.362 = 431. There 
appears to be a difference of about four-tenths seconds between the 
average performances of the group in favor of the sophomores. The 
question is, “Is this a significant difference?’ We interpret an 
obtained difference in the same way in which an obtained mean is 
interpreted, that is, in terms of its standard error. In two other 
distributions of the same type obtained at some other institution, 
another difference will be obtained. Therefore, this obtained differ- 
ence is only one sample of many. We may say that the chances are 
68 in 100 that the obtained difference does not diverge from the true 
difference by more than + .1172 seconds. That is, in 68 per cent 
of all samples the true difference will lie between .314 and .548 
seconds, 

For all practical Purposes we may consider that all cases in a 
normal distribution lie within three sigma on each side of the mean. 


Hence when P greater than three we can be sure that the obtained 


difference has complete reliability, 7. e., that a difference of the same 
sign as that obtained actually exists. To compute the chances that 
the obtained difference will always be greater than zero, one must 
consult normal probability tables. 5 


Selected References 

Garrett, Henry E.: Statistics in Psychology and Education, Second Edition, 
Chapters I, П, III. New York, Longmans, Green and Company, 1945. Pp. xiv 
and 495, 


5See Table XX XII, p. 392. 


Elementary Statistical Methods 275 


These chapters deal with the frequency distribution, measures of central 
tendency and dispersion in an excellent manner and will be found most helpful 
to students of educational statistics. 

Соор, Warren R.: The Elements of Statistics. Ann Arbor, Michigan, The Ann 
Arbor Press, 1933. Pp. 28. 

This pamphlet is intended as an introductory course in statistics and presents 
material in brief and simple form which normally would occupy about 200 
Pages of a textbook on educational statistics. The same material is presented 
by the author in the Res. Quart. Am. Phys. Educ. Assoc., Vol. IV, No. 2 (May, 
1955), 131-156. 

GUILFORD, J. P.: Fundamental Statistics in Psychology and Education. New York, 
McGraw-Hill Book Company, 1942. Pp. хі and 333. 

Chapters I through X deal with the importance of measurement in education 
and psychology, frequency distributions, measures of central tendency, measures 
of variability, properties of the normal curve, reliability, testing hypotheses, 
and the prediction of errors. 

OLZINGER, KARL J.: Statistical Methods Sor Students in Education, Chapters 
II, v, VI, VII, VIII, XIII. Boston, Ginn and Company, 1928. Pp. viii and 372. 

E he chapters on “Errors in Calculation and Measurement” (V) and “Sam- 
pling and Response Errors” (XIII) should be studied carefully by every 
Student of measurement. 

Sorenson, HERBERT: Statistics Jor Students of Psychology and Education. New 
York, McGraw-Hill Book Company, 1936. Pp. viii and 373. 
Тамроџиѕт, E. F.: 4 First Course in Statistics. New York, Houghton Mifflin 
Company, 1942. Pp. xiand 109. 4 s 
n addition to the usual material of a standard text, Chapter III dealing with 
1e computation, use and interpretation of percentiles will be found particularly 
helpful. 
Obert, C. W.: An Introduction to Educational Statistics. New York, Prentice- 
Hall, Inc., 1946, Pp. xiii and 269. Е К " 
hapters I through VI deal with a discussion of tabulation and classification, 
Braphs, measures of central tendency, and measures of variability. 
Surry L С. MILTON: 4 Simplified Guide to Statistics for Psychology and Education. 

ew York, Rhinehart and Company, Inc., 1946. Pp. ix and 109. 

Simplified presentation of the basic statistical materials which the student 
of education should understand. See especially the first chapter on the need 
Or statistics, 

HURSTONE, L. L.: The Fundamentals of Statistics. New York, The Macmillan 
?mpany, 1928, Pp. xvi and 237. . 

his reference contains the usual discussion of measures of central tendency 
and variability, but is particularly valuable for other material which will be 
Cited e sewhere. 

Varker, Heren M.: Elementary Statistical Methods, Chapters I, II, III, IV, V, 
VL ҮП, VITI, IX, X, and XI. New York, Henry Holt and Company, 1943. 
Р. xxv and 368. 

is text is an excellent one for the beginning student of statistics. The first 
С apter, “The Nature of Measurement", will give the student a clear under- 
Standing of the purpose of measurement and the significance of numbers. 


CHAPTER XIV 


Elementary Statistical Methods 
(Continued) 


Correlation. Meaning of Correlation. It is very important to have 
a knowledge of the relationship which exists between one capacity 
and another. We may want to know, for example, in physical edu- 
cation, whether there is any relationship between long arms and 
ability to throw a baseball, between long legs and ability to run fast 
or between performance as displayed in one event and that displayed 
in another; in other words, whether there is a marked tendency for 
students who do well in one particular thing to do well also in 
another. There are many ways of computing relationship or corre- 
lation, but the one presented here is the one in almost universal use 
when dealing with fifty or more cases. Texts on statistical methods 
should be consulted for other methods. 

In expressing relationships in a quantitative way, “r” is the most 
commonly used symbol. This is designated as the “coefficient of 
correlation." The coefficient of correlation “т” may vary all the 
way from +1 to —1, that is, from perfect positive relationship to 
perfect negative relationship. Suppose that the runners who had the 
most speed could always jump the farthest, those who had medium 
speed were medium jumpers, and those who were the slowest were 
also the poorest jumpers. If such a situation were actually true, а 
perfect or at least extremely high positive correlation would be the 
result. Suppose again that the fastest runners were always the 
poorest rope climbers and vice versa. Such a correlation would 
result in a perfect or nearly perfect negative correlation of —1. А 
third extreme situation which might arise is one like this. There are 
as many good runners who are also good rope climbers as there are 

276 


Elementary Statistical Methods 277 


TABLE XVI 
Basketball | 50 yd. Basketball | 50 yd. Basketball | 50 yd. 
Case | throw for | dash to || Case | throw for | dash to Case throw for | dash to 
no. | distance to | nearest | no. | distance to | nearest || no. | distance to nearest 
nearest | fifth nearest | fifth nearest | fifth 
Soot vecond foot second foot second 
1 43 7-1 | 21 59 8-0 | 41 52 7-4 
2 46 7-4 22 55 7-4 | 42 56 7-0 
3 46 7-8 || 23 67 6-4 || 43 48 7-2 
4 65 7-1 24 62 7-2 || 44 54 7-2 
5 39 7-4 | 25 31 7-4 || 45 58 8-4 
6 57 7-4 || 26 49 8-1 || 46 40 7-2 
7 48 8-1 || 27 64 7-0 | 47 4l 7-4 
8 32 7-3 ‘|| 28 56 7-2 || 48 62 8-2 
9 47 7-4 || 29 45 8-0 || 49 58 8-0 
10 56 8-2 | 30 46 7-4 | 50 61 8-0 
11 57 8-4 || 1 45 7-2 | 51 53 7-8 
12 60 8-0 | 32 47 8-0 | 52 47 6-4 
15 56 8-0 || 33 57 7-4 | 55 51 9-0 
14 58 7-2 | 34 44 7-4 || 54 69 7-1 
15 55 8-1 | 35 48 8-2 || 55 50 7-1 
16 54 8-1 || 56 47 6-4 
17 52 9-2 | 37 51 7-8 
18 47 8-2 || 38 57 8-1 
19 48 7-4 || 59 62 8-0 
20 40 7-1 || 40 61 7-4 


Poor ones who are good горе climbers, and there are as many poor 
runners who are poor rope climbers as there are good runners who 
аге poor rope climbers. Such a combination as this results in just 
no relationship at all ога correlation of zero. 

Extremes such as have been cited rarely, if ever, happen because 
of the fact that in dealing with human traits there are many vari- 
ab les besides the two in question which have a bearing upon the 
Situation, 

Pearson Product-moment Method of Computing a Correlation 
Coefficient. The calculation of a product-moment “r” can best be 
illustr ated by showing the computations involved. Figure 10, called 
а scatter diagram, will serve as such an illustration. The problem is 

© compute the relationship, if any, existing between time in running 
Rud ability to throw a basketball for distance. ! 


"The data of Table XVI were kindly supplied by Miss Hazel Cubberley, Univer- 
Sity of California at Los Angeles, and represent abilities of freshman women 
Majors in physical education. Note that abilities must be paired to be of value 
in the computation of “r.” + 


278 Tools of Measurement 


=] o 4] S| a ч D = 
© тәр р жуу ا‎ + т 
«w» Ф| o oj +| + Ф Г] 
У АЗЕ: m e 
ESI nN 
Zr} o| 9| ЕЕ < T D 
ЕКШЕ ЕЕЕ ЕЕ " e 
T = 
CERBDEDEE ч = ъс T 
ات‎ +| oj- 
^а е el sj 3| е o| o $| e 
E + 
—[e|s| e| 9| a| £ 
iale Fs] al M 
[о ——— 
iTe | sl el ЕЕ: o| © 
П 8 = о 
[о 2 n + 
wt ое 
ЧН ЕЕЕЕО В w| о! Q| م‎ 
St | wo = ا‎ чн T | 
~ i Tl oj <| oj اھ‎ е 
э!|"? al & a 7| = ч 
Se 
П E 
a 11% al 9| ©су 9 
Zale 
5 2 = 
9517 
© 
v LL 
жет 
=} [o 
PRU 
НЕЧ 
e n 
51| № EE! © 
Hb nier lai es 
- 
fife КЕЛЕ 
= 1 “| X E 
єт rd ВЕТ У ШП 
[e © EE: 
1 юю 2 N 
П + 
(LES = 1| T| 1 
|26, да o| o 
5 o 
| = E z 3| 5| 9| 5 
ihe lis 5 TE 
APO H e| юр 
A E = Isl EE: Se t g 
«| 2| ш al af © кю ؟‎ Ss] ou 
al أ ا ا ا ا اف‎ 23373 EES 
Й 
9| | ю N| +| ال‎ ol oj aj 9| of у v 5 
Ф| Ф| Ф| اہ‎ 5 o| i| «| $| S| 4] | 8 | 
499; U! 82UDjsIp 20; Moy} ||oqjaxsog 


*e------2---— əlqonoa —JA- —--——---- > 


Fig. 10. Calculation of the product-moment coefficient of correlation between 
ability in throwing basketball and speed in running of 55 college women. 


Steps in the Computation of r from a Scatter Diagram. 1. Decide 
upon the intervals to be used in each one of the variables. This has 
been discussed previously.? These intervals should be listed on the 
left (Y-variable) of the diagram and at the top (X-variable). Care 
must be taken that the intervals run from lower numbers to high as 
indicated by the circle and arrows in the lower left-hand corner ot 
the diagram. Note that while 6-4 is a better time than 9-2, itisa 
lower number and should be placed on the left of the diagram. 


2See p. 253. 


Elementary Statistical Methods 279 


2. Starting with Case No. 1, in Table XVI, make a tally in the 
proper cell for each pair of performances. The cell of Case No. 1 has 
been heavily ruled. The tallies are placed in the upper left-hand 
corner. 

5. When tallying has been completed assume as a mean for each 
variable the midpoint of an interval approximately in the middle of 
each distribution. Rule off the cells in the row and the column in 
which the assumed mean lies, by heavy lines or colored pencil as 
indicated in Fig. 10. Compute the correction (c) for each variable, 
¢z and c,,? and the standard deviation of each variable, x and y,* in 
in terms of deviation units, that is, without multiplying by the size 
of the interval. This procedure is followed to simplify the final 
calculation of 7. 

4. Each cell deviates from the assumed origin of the diagram 
(the intersection of the row and the column containing the assumed 
means of the two variables), +. e., by an amount in cell deviation 
equal to the product of the cell distance in one direction by the cell 
distance in the other. The four areas in point of direction from the 
assumed origin are upper right, upper left, lower left and lower right, 
or quadrants I, II, III and IV respectively. To the right is positive, 
p negative, upward positive and downward negative. Hence, cell 

€viations in quadrants I and III will be positive (plus times plus) 
and (minus times minus) and in quadrants II and IV negative 
minus times plus) and (plus times minus). In Case No. 1, the cell 


= 83-52 31 —56+54  —2 
Gy =; S902 SL _ с: _ NI a 05 
"duis ا‎ = eee = ЕУ کے ے‎ 056 
BIE) IB ez = .001 
ы! шы 501 3 = V = ia 
55 — .518 = 2.965 о: = 55 400i = 22620 
су = aogier ШЫ) (in deviation units) 
(in actual units — ft.) oz = 2.690 X X = .538 (seconds) 
Ba cati = — (.564 X —.036) 
pec dy "^^ 1919650001690 .091 
— г? А 1 — (—.091)*) 
P, = 67450 — 0) _ 267450 d 
VN v55 
mae chances are 50-50 that the true correlation will lie between —.091 + .09 
—.091 — .09, or between zero and —.18. 
3 
Sce p. 265, 


See p. 270, 


280 Tools of Measurement 


deviation, its product deviation from the assumed origin is (—3 X 
—2) = +6. This figure may be placed in the lower right-hand cor- 
ner of the cell as indicated in Fig. 10. Product deviations of cells must 
be multiplied by the frequencies in the cell. Since the cell repre- 
senting Case No. 1 has a frequency of one, its product-moment will 
be 1 X (+6) or +6. It, however, there were three frequencies in 
the cell, its product-moment would be 3 X (+6) or +18. 

5. When the product-moments of all cells have been computed 
with due regard for signs the entries in Say column may be made. 
Note that product-moments of cells have been circled to facilitate 
addition. The entries of the xy column represent the algebraic 
sum of all the product-moments in a particular row. Thus, in the 
first row at the top, —21 js the only figure, but, in the fourth row 
from the top, the algebraic sum is(4-12 + 12 — 8) or +16. Atter 
the xy's for each row are listed, the algebraic sum of all, Day, is 
computed. In the problem at hand this sum is —41l. As a check, 
the xy's for the columns should be computed and added. The two 
answers thus obtained as Zxy must agree if there has been no error 
in computation. 


6. Compute the coefficient of correlation, r, by means of the 
formula 


= 25У. TRU 
cV Бу 
0:7 


Since deviations are taken from assumed means, a correction to 
x is necessary. The formula for use then becomes 

r= E ‘ely 
туту 

The corrections, c, c, and the standard deviations сг, c, are all 
in terms of deviation units and have not been 
representing “size of interval." 

The correlation actually obtained,.—.091, is so small, and the 
probable error? is so large in proportion to the size of г, that we can 
almost conclude that there never will be a relationship between 
abilities as presented of such a value as to be useful in predicting 


5The probable error of an г may be computed by the following formula: 
_ .6745 (1 — 73) 
P.E., = 2—05, 
VN 


multiplied by units 


» where ris the value of the correlation and N the number 


of cases, 


Elementary Statistical Methods 281 


success in one ability from performance in the other. However, 
conclusions from the data of fifty-five cases ought never to be given 
great weight. Several hundred cases will undoubtedly be a better 
sampling and may show an entirely different relationship. 
Regression. The prediction of success in one ability from 
performance in another brings us immediately into the problem of 
regression and regression lines. If the units on the two axes are 
taken so that ø, represents the same distance on the X axis as c, 
represents on the Y axis, and if high values of Х are accompanied 
uniformly by high values of Y, and vice versa, low values of X by 
low values of Y, a simple scatter diagram may be represented as in 
Fig. 11 with the crosses indicating the scores. In such a situation 


the ratios У and © are always equal to each other and r = 1.00 be- A 
x y 


cause of the fact that the regression line is the diagonal of a square 
and the tangent of an angle of 45? = 1.00. 

Such a situation does not occur in actual practice, however, and 
a scatter diagram will normally have two regression lines, one repre- 
Senting the best-fit line passing through the mean of the rows and 
the other representing the best-fit line passing through the mean 
of the columns (see Fig. 12). Thus, if we have two distributions, and 
make the size of each cell (that is, width and height) comparable 
to the difference in standard deviation between the two distributions, 
we shall have a scatter diagram resembling Fig. 12. 


Fig. 11. A regression line. 


282 Tools of Measurement 


` The mean of the rows will be designated by the +s and the mean 
of the columns by the o's. The “ X on Y" line that is, the best-fit 
line for the mean of rows, will give the most probable value of vari- 
able X for a given value of variable Y. The “Y on XY" line will 
likewise give the most probable value of Y for a given value of X. 
In the problem at hand, r = .625. Since the standard deviations 
of both variables in the construction of the diagram represent equal 


distances on the two axes respectively, the ratios of * and Y the 
x 


slopes of the two regression lines, will represent the correlation 
between the two variables. Thus, on the line “ X on Y" the ratio 


сао ва :625 and Ње“ Y on X" line, the ratio” = Em .625. The 
y 8 x. 8 


distances 5 and 8 are measured in cell units from the means of the 


Raw scores have been reduced to T-scores to make o’s equal. 


== X-variable — Shot-put- 

Tere Pinte 
MSERERMBREERREERED E 
CEEEEBRREEELEEEEEL 

| 74-78 a JT] | s 

73-25 1 

70-7% 1 x 

67-65 i Your 
Po rú ЦІ 
1 (Tee) l x o i 
ф me El H d ej mu | 

R26] і 
a sis | H B 1 13 {> oT] g 
Sie Eat 
à 


EA 
PAS 
Tesi fea El 
o 
d 
2 
0 


IT. 


Fig. 12. Coefficient of correlation (shown graphically) between baseball 
throw for distance and shot-put — 200 cases of college men. 


Elementary Statistical Methods 283 


distributions. It will be noted that both regression lines pass through 
the point where the mean lines cross. Thus the “best-fit” lines are 
E value in showing graphically the change in average shot-put per- 
ormance which accompanies a given change in baseball throw 
CX on Y line), and the change in baseball throw performance which 
accompanies a given change in shot-put (2 on X line). 

pu showing graphically the relationship between two 
ae es, ittle use is made of the actual regression lines on a scatter 
noo Itis just as simple and probably more accurate to compute 
Bre anges which take place by means of regression equations. In 

€viation form these are 
Oz z 
Ed х= Ferd = .625y (1)6 


а; Р 
у= ryt = .625x (2) 


SS so happens that, in the example given, the standard deviations 
re both made equal by reducing raw scores to T-scale values.” 
15 procedure may not be necessary in actual practice and differ- 
сез in the S.D. values will then have to be taken into account. 
c "pem standard deviations of the two variables, not converted 
-scale units are as follows: 


X-variable (Shot-Put) = 4.25 feet 
Y-variable (Baseball Throw) — 23.0 feet 


Hence the equations (1) and (2) above will become 


= .62 = 

х = .625 25.00" .1155y 
25.00 TA 

у = .625 4.25 x = 5.582% 


vari mating the most probable score which will be made in one 
оле е ( X), knowing the score in another (2), it is usually more 
Ee n. to employ the score form of the regression equation, 
Уе han to convert variable Y into а deviation from its mean. 
x 9 М, equal the mean of ће X variable (28 feet) and 71, equal 
m п of the Y variable (162.5 feet), then substituting in equa- 
Д ) and (2), since x = X — JL, and y = Y — Zf,, we have 
Small x and y are used to denote deviations from the means of the X and Ў 


Istributions: 


o x A., Measurement, pp- 505-508. New York, The Macmillan Com- 


284 Tools оў Measuremem 


X — М. = rs (Y — A) 
су 
ог 
X= ro EY — My) + Me (3) 
у 

Y — My = rA (X — И) 
Y- ru (XE — Mz) + My (4) 
Substituting actual values from the problem under discussion we 
have: i 
: = 16254:25(у _ 
X = .6255-=0(У — 162.5) + 28.0 
= .l155Y + 9.23 (5) 


С 05002 
У = -625725 (X — 28.0) + 162.5 


= 3.382X + 67.75 (6) 


and 


Suppose that a man’s baseball throw is 175 feet, what is his most 
likely shot-put distance? Substituting in (5), we have 


X = (.1155 X 175) + 9.23 = 29.43 ft. or 29 ft., 5 in. 


Again, what is the most probable baseball throw a man will make 
whose shot-put record is 32 feet, 3 inches? Substituting in (6), we 
have 

Y = (3.382 X 32.25) + 67.75 = 176.82 ft. or 176 ft., 10 in. 


The Standard Error of Estimate, The most probable values 
found for X and Y are, of course, the best we can do in the way of 
prediction from the data given. They are, however, only predictions, 
and are not to be regarded as perfectly accurate estimates. However, 
we may know within what limits the true values of Х and Y are 
likely to fall. These limits may be obtained from the following 
formula which is referred to as the “standard error of estimate” 


when estimations in one variable or performance are made from a 
correlated variable or performance. 


(est) = o(variabley у — r? 
For variable X in the problem at hand, this equation will be 
Фен. ху =0 YI — гї = 4.254/1 — (.625)1 = 3.32 


In interpreting this figure we may say that in 68 per cent of the 
cases the true value of X will fall within the limits 29.43 + 3.32 or 
between 26 feet, 1 inch and 32 feet, 9 inches. 


Elementary Statistical Methods 285 


The probable error of estimate is obtained by multiplying the 
standard error by 0.6745. Thus 


PE, (est) = .6745 X stis VAL ms 
= .6745 X 4.25/1 = ).625( = 2.24 


Interpreting, in 50 per cent of the cases, or the chances are 50-50 
that, the true value of X will fall within the limits 29.45 + 2.24 or 
between 27 feet, 2 inches and 31 feet, 8 inches. 

It should be quite plain that the standard error of estimate and 
the probable error of estimate depend entirely upon the o’s of the 
distributions and upon the correlation between the two variables. 
Asr approaches 1, the standard error of estimate will approach zero, 
but it should be observed that when r = .99, the standard error of 
estimate is still approximately one-fifth of the standard deviation 
of the distribution and when r = .8666, the standard error of esti- 
mate equals one-half of the standard deviation of the distribution. 

ence, correlations of .3 and .4 are practically worthless for pre- 
diction purposes, being only slightly better than a guess. 

Correlation Ratio. (Nonlinear Relationship). Space does not 
Permit the explanation of all statistical procedures involved in the 
calculation of relationships. One such important procedure is that 
9f correlation ratio, designated by the Greek letter eta (n). The 
calculation of 7 is left as a special problem for the interested student. 

exts on statistical methods should be consulted. § 

Partial and Multiple Correlation. In dealing with the relation- 
Sup which exists between two physical traits, or two physical 
Performances, we may wish to rule out or keep constant certain 
Other influences which have a bearing on the relationship of the 
two variables. As an illustration of the effect of an uncontrolled 
actor on correlation, the following problem may be cited. The 
Correlation with 408 cases of college men between weight and ability 
™ shot-put is 520, Since height and weight in college men have a 
Marked relationship, it is desired to determine the relationship 

tween the shot-put and weight, holding height constant. This 
Procedure js accomplished by means of partial correlation. Specif- 
‘cally, the coefficient of partial correlation represents the net re- 
See EY, d Education, Second Edition, 
рр. Я Н.в. See E 28а Company, 1945, and K. J. 
Olzinger, Statistical Methods Jor Students in Education, Chapter X. Boston, 

mn and Company, 1928. 

11 


286 Tools of Measurement 


lationship existing between two variables when other variables, 
which may affect the true correlation by raising it or lowering it, 
have been ruled out or held constant. 


Variable 1 = shot-put Simple correlation coefficients 
Variable 2 = weight т» = .520 
Variable 3 = height тз = .395 

ra = .583 


гоз will designate the correlation between shot-put and weight, 
holding height constant. 


rin — nare 


Substituting the values of the simple correlation coefficients in (7) 
Р .520 — .395 X .585 
м1 = (.395): V1 — (.583)* 
'Thus it may be seen that the true correlation between shot-put 
ability and weight is materially reduced when height is held constant. 


In like manner we may find the partial coefficient of correlation 
of shot-put with height, holding weight constant. 


(7) 


Lx 


= .389 


паз = P3 — nora (8) 
"5 MVI- Ps VI- ra 
Е .595 — .520 x .585 
МЇ = (.520) V1 — (.583)? 


Approximately twice as much reduction is made in the net corre- 
lation coefficient here as in the case of shot-put and weight, holding 
height constant. Weight, then, plays a much more important part 
in shot-put performance than height. 

Suppose now we wish to discover the combined eftect which height 
and weight have on performance in the shot-put. Letting R stand 
for the coefficient of multiple correlation, the formula for use is 


= .152 


Кыз = V1 — [(1 — 73 Q. — 7.2)] 9) 
or as a check on the value thus obtained, 
Rı = V1 —[(1 — r3)  — Pal (10) 


Substituting the proper values in (9) 


Ria = V1 — I — C5201 [(1 — (:132)] = .533 
or 


Rı. = V1 — [1 — (.395)] [1 — (.589)°] = .533 


Elementary Statistical Methods 287 


Here again it may be seen that height adds practically nothing to 
the correlation already existing between weight and performance 
in the shot-put. This may be interpreted as meaning that we are 
able to predict the ability to put the shot practically as well from 
weight alone as we are from both weight and height. 

In order to set up a regression equation involving the calculation 
of one variable or performance knowing the other two, the standard 

eviations of variables must be known and from these the standard 
deviations of the second order computed. 


оі = 3.57 ft. M = 50.18 
оз = 15.785 lbs. Jf, = 156.2 
оз = 2.535 in. A, = 68.53 


d The regression coefficients in a problem of this nature will be 
€noted by b123 and б. These may be computed from the equations 


bes = rea (11) 
02.13 
and 
91.23 
a= фз 12 
bis.2 113.2 ETE (12) 


In order to calculate these, олз, 0213 and 634» must be known. 


on = оу М1 — ris V1 — rias (13) 
osis = 0: Vl — ra М1 — res (14) 
os. = оз VI га МІ = rss (15) 


Substituting the proper values in (15), (14) and (15), we have 


тыз = 5.57 V/l — (520) МІ — (132) = 3.02 
озлз = 15.7851 = (583) М1 — (389) = 11.80 


сал = 2.585 V1 — (5839 V1 — (132)! = 2.04 


The regression coefficients necessary will then be 


5.02 
за = 38957 go = 0995 
3.02 _ 
бал = 192594 = 1954 


The regression equation in score form to be used in estimating shot- 
Put performance, knowing height and weight, will be 


(X — 1) = bis (X: — Me) + bis (Хз — ДЇ) 
When a variable is to be estimated from one or more others it is 


Wri 2 
"ten with a line over it thus — Xi 


288 Tools of Measurement 


aa = .0995(X2 — 156.2) + 1954 (X5 — 68.53) 4- 50.18 
= .0995 X; + .1954 X; + 5.28 (16) 
What is the most probable performance of a college man in the 
12# shot-put if his height is 70 inches and his weight is 160 pounds? 
Substituting in (16) we have, 


= (.0995 x 160) + (.1954 X 70) + 3.28 = 32.88 ft. or 32 ft., 1014 in. 


We must now compute the reliability of this estimate by means of 
the formula for the standard error of estimate. 


а(х = gis = 5.02 


Thus it may be said that the chances are 68 in 100 that the individ- 
ual's true performance in the shot-put will lie between 32.88 + 3.02 
or between 29 feet, 10 inches and 55 feet, 11 inches. 

The coefficient of multiple correlation is frequently computed by 


the formula 
з = сыа ш (5.02)? _ و‎ 
Ros E = Maie (579 = .555 (17) 


The Computation of Classification Indices or Formulas. 
The computation of a classification index using the factors of age, 
height and weight involves the solution of a four-variable problem 
in partial and multiple correlation techniques. So that the student 
may be familiar with the techniques used, a sample problem is 

` offered herewith. 

The data presented here are taken from material used in California 


in setting up achievement scales for boys in the Basketball Throw 
for Distance. 


Standard 
Variable deviations. Zero-order correlations. 
1. Performance.. 16.60 ft. fi = .715 ra =. 
2. Age.. +. 1.75 yrs. rı = .711 га = 
3. Heigh 4.75 in. rı4 = .709 ru =. 
4, Weigh 26.0 lbs. 


Six partial correlation coefficients of the first order are necessary 
in order to get the three required partials of the second order. 


a ra — mre _ _ 713 — 711 X 751 
ms ISP MIU 703 X660 = "586 


s ги — ri ra 209 —.711 X .890 — 
БОБ ЕКА ИЙГЕ ЛИЙ КАЙЕКЕ ООС 


Elementary Statistical Methods 289 


ола тагы 1755 — .751 X 890 
гиз = = 
VI — fa V1 - ru 660 X 456 220 
_ _ fm—rera 0 00710 — 718 X 7581 
13.2 = = = 5 
VIC re V1 — 2а 701 x .660 S 
کے ےر‎ з сш .709 —.718 X 735 | 39] 
VI = rs V1 = ru 701x.680 
Ги — rm Pa .890 — .751 X .755 
Tru. = = = 
a= MT Fa М ги X60 x.680 8 


The three partial correlation coefficients of the second order are 
as follows: 


iai ie ris — Pues 720.3 _ 586 — .257 X 216 _ 353 
à VI rus V — rua .972 X .976 b 
Do — Ги 734.2 378 — .591 X .758 m 
rau = = EM 
aN T MI = Pus V1 = Pus 920 x .652 d» 
Tua — Гыз 734.2 391 — .578 X .758 
lu = = = .173 
йөзе BIN IEEE 926 x .652 Lo 
Four third order standard deviations will be required as follows: 
9124. = VT — Ae VI rss VI = rus = 
16.6 X .701 X .926 X .985 = 10.62 


7114 = oV — m, І гн: V1 — Ai = 
ra МІ Pas 115175 X .660 X 976 X .956 = 1.056 


Tuan = ау ra VI — nua VI = ёам = 
4.75 X .660 X .652 X .991 = 2.025 


maa = VT — pa VI — mua V — nea — 
А и VI- rus VÎ — ау 456 X 976 X 985 = 1140 


жык = 


aca ete 
сү 


The partial regression coefficients are: 


2-2 42:10:62 
һзм = гом Б = .555 X 1056 ^ 5.55 
T 10.62 
bas = nsn uu = 1157 X 5095 = 720 
10.62 _ 


am 175 X пао = 161 


bus = nen —— 
94.123 


Under normal circumstances these regression coefficients would 


i used in a regression equation fo predict performance ability 
Nowing the factors of age, height and weight. In the particular 


290 j Tools of Measurement 


case at hand, we wish only to know how each of these factors influ- 
ences performance and the regression coefficients tell us exactly that. 

The index then is 3.55A + 0.720H + 0.161W, or, since this is an 
algebraic equation, we can multiply all terms by a number which will 
make the age factor 20. 

Hence, in the Basketball Throw for Elementary School Boys, the 
best classification index is 20A + 4.06H + 0.91W. 

Solution of a Five-variable Problem. The complete solution 
of a problem involving the multiple correlation of four variables 
with the criterion will be helpful to the student. Suppose we have 
given a criterion of general athletic ability, namely, the combined 
scores in seven different elements of that ability including forty-one 
separate tests distributed in the seven elements. We wish to see how 
much faith we may put in the combined scores of four tests as a 
measure of general athletic ability, in other words, how well these 
four tests measure general athletic ability. These tests are (1) base- 
ball throw for distance, (2) standing broad jump, (3) long dive (for 
distance) and (4) 120-yard low hurdles. Table XVII contains the 
simple correlations necessary for the solution of the problem. Raw 
scores have been reduced to T-scores for simplicity, thus making 
standard deviations equal to 10. This is one of the fundamental 
concepts of the T-scale method of scoring.? 


TABLE XVII 


CORRELATIONS ОЕ 213 COLLEGE MEN IN PERFORMANCE ON Four TESTS 


Xp X X. v 
Criterion | Baseball Standing Long 
score throw broad jump dive 
Baseball throw (X3) secs .725 
Standing broad jump (X;).... .654 .332 
Long dive (Xo. .565 .291 .540 
120-yd. low hurdles (X4).... .740 .478 .429 .542 
My = 350 
со = 39.45 


M = Jf; = Л = M = 50 


оү = 02 = оз = оң = 10 


The regression equation for a five-variable problem of this sort will 


be Xo = бизи Xa + beum Ж» + Deam Xs + boss Xa + К (а 
9 McCall, W. A., ор. cit., pp. 505-508. 


Elementary Statistical Methods 291 


constant). Note that the subscript zero is used when the criterion 
is involved. 
The regression coefficients necessary are: 


0.1234 00.1234 

һин = e Dosis = 108.12455 o24 

01.234 = lüLf3g „з f 03.124 BAAG o4 
сол = олан 

bi ت‎ Фоли b r 

02.134 = l'en зд 04.123 ФАЛ өз 


Partial correlation coefficients needed аге: 
(a) бс} b itt Toi. — Г02.34712.34 
а) ro. w: ay vritten гае = == 
там which may be written rase = VT pigs, УТ — Fase 
(5) H b itt ros — CNAN. 
b) r a n Tu MEET ETE 
was which may be written rosa = ууз 34 VTE Fa 
08.12 — Г0.12724.12 
(с) елм = کے‎ 
МЇ = run М1 — гил 
Гола — Г03.12734.12 
(d) rosi 


VI = ra.n M = Pun 


To solve for the partial r’s in (а), three partial r’s of the second 
order are necessary. 


(а) газ — Г04.3714.3 
=: a A 2 
ix М1 — гиз М1 — rus 
(а) гоз — ГО.3724.3 


_ С E 
rau = J1 — rus V1 Pus 


(a3) гоз — CUIA. 


= e 
E М1 — rus М1 — Pas 


It will be immediately seen that six partial r’s of the first order 
must be found before the three equations above can be solved. 
hese can be found readily as indicated in equations (7) and (8) 
Page 286. They will be: 
газ, P043, (14.87 102.37 124.3) 712.3 
The solution of the partial r’s in (6) involve no other partial r's 
an are already at hand in the computation of the гіп (a). 
or the partial r’s in (c), however, three partial r’s of the second 
order are necessary. 
(a) fa mai ___ 
гал = m = Paa Vi Paa 
(ce) roi — l021724.1 


ы mana — 
rout = VT pua Vl — Pua 


(ea) гил 03.241 


= = 
ru. = vi Zya М1 — Paa 


292 Tools of Measurement 


Six partial r’s of the first order must be found for the solution of 
these. 
To3.1, Г02.1, P23.1, PAL 724.1, 184.1 


Fourth order standard deviations will be required as follows: 


солем = V1 — гау — гол v = raa Vl гила 


equals standard error of estimate or с est. 
соз (or ота) = aV1 — FV — raa Vl — auv i гун 
92.011 (Ог 02.4310) = оз\/1 = ral = галау € ravi — reas 
23.01 (or 03.1240) = avl = avl — лам m гау — гали 
04.0123 (Ог 4.1220) = V1 x ruvi = аам = илмї — rom 


The coefficient of multiple correlation is 


Коли = y l1- “ыш 
a% 


Returning to the solutions necessary in (a), involving proie in 


(a3), (a2) and (аз): 


MA rol — reru ET .725 — .565 x .291 ‚725 — 164 _ 709 
UU УТ rovi- ra WI-(C565*v1- (291? 825 X 957 ` ` 
à To — Гоги „2 .740 — .565 X .542 = 706 
Ма UA DEEP: UA раса 825 X .940 i 
NE Ги — rura & .478 — .291 X .342 = 4921 
“an VT = ravi па -957 X .940 d 
lon ro — roro _ 654 — .565 X .540 — 595 
mt УГ raV ra, 825 X .940 ` 
LS ru — rara .429 — .540 X .342 354 
N/a Au 940 X .940 
LM rı2 — riso = .532 — .291 X .540 259 
Eu IESU EA 957 X.940 
.709 — .706 X .421 
(а) ra. = — 708x907 = .643 
_ .595 — 706 X .354 _ 
OOO i ru xS 
_ .259 — 421 X .554 
(аз) гам = = 307x935 = .150 
(а) газ = 45—53 с 150 = .681 
_ 522 — 643 X 130 _ 
(6) rosa = 766 x 992 = 977 


In the solution of (c), we first obtain the six partial r’s of the first 
order: 
tot ro — roris .565 — .725 X .291 = 537 
“1 УТ туї па —.691 X.957 З 


Elementary Statistical Methods 293 


fuc re» — rois _ 654 = .723 X.532 _ oz 
VI — ravi ri 691 X 945 1655 
ra — гиз .540 — .552 X .291 
гел = سے‎ = — 
Vi — rV — Ps 943 X 957 29 
D ros — rais ЕЗ ‚740 — .723 X .478 _ 
и E Ww 
TR m г — Pieris үк .429 — .552 × .478 326 
АИТ =т= 95х88 
ги — nula .542 — .291 X .478 
гил = سے‎ = = 
5 VI = М — гн 957 X .878 4; 
Then 
.557 — .65: 2 
‚ (а) ron = DV 775 S m .491 
= 656 32 
(сг) гал: = 650 — 655 X 526 = .607 
949 — 32 
.242 — 270 X .526 . .169 


(e) rwn = 77 98g x 945 
491 — .607 X .169 
s m ———— dd 
(с) ronz .795 x .986 496 
.607 — .491 X .169 
ے‎ 607 — 491 X 167 ے‎ 
(d) rows = — 571 x 3986 510 


Computation of the fourth order o’s is as follows: 
59.45 X .691 X .775 X 871 X .792 = 14.55 


09.1234 = 
1.0034 (or сл) = 10 X .878 X 988 X .992 X .732 = 6.30 
When 
risa = 155 
Si asoga (ог 22,450) = 10 X -905 X .974 X .992 X .817 = 7.15 
en 


raa = .227 
3.0124 (or esas) = 10 X -957 X .965 X .986 X .868 = 7.88 
4.0123 (or c4 1230) = 10 X .878 X 1945 X .986 X .792 = 6.49 


Computation of Regression Coefficients: 
14.53 


ен OO 496 eoo E 
wan = 681533 = 1.57 бали = 49677.88 = 92 
-— БРЕ 8 

ала = „577 2532 = 118 фила = .6107649 = 1-57 


Th 1 
x * regression equation will be 
Mee ж, 0) + 118 265219 + 92 (Œh) + 1.37 (04-00) + Mo 
Stituting and collecting terms 


294 Tools of Measurement 


Xo = 1.57X, + 1.18X2 + .92X; + 1.57, + 98 
Rosas = V 201и = .950 


оо 


As a check — Roi = " 
MA — (0. = ra) (1 = rea) (1 — Aoa) (1 — r°o4.193)] = .930 


Knowing T-score values on each of the four variables (1) baseball 
throw, (2) standing broad jump, (3) long dive, and (4) 120-yard 


low hurdles, we can estimate general athletic ability with a con- 


siderable degree of accuracy by using the regression equation above. 
Just how accurately may be shown in the following statement: in 
68 per cent of the cases, general athletic ability may be estimated 
within 14.5 points. Since the standard error of estimate (союз = 
14.55) is but 37 per cent of the standard deviation, it is consequently 
65 per cent better than a guess. When Rossi = 1 the standard 
error of estimate will be 0.0 and prediction will be perfect; when 
Ko.ı2 = .00, the standard error of estimate will be 39.45 and hence 
an estimate will be pure guesswork. 


Selected References 


Garrett, Henry E.: Statistics in Psychology and Education, Second Edition, 
Chapters IX, XII, XIII, and XIV. New York, Longmans, Green and Com- 
pany, 1945. Pp. xiv and 493. К 

The meaning of correlation and its calculation by a number of methods are 
explained in a manner which should be clear to all. The methods discussed 
include: (1) product-moment, (2) method of rank differences, (3) Spearman 
foot rule, (4) coefficient of contingency, and (5) correlation ratio. The presen- 
tation of partial and multiple correlation techniques is excellent and fully 
explained and illustrated by problems. 

GUILFORD, J. P.: Fundamental Statistics in Psychology and Education. New York, 
McGraw-Hill Book Company, 1942. Pp. xi and 333. 

Chapters XI, XII, and XIII deal with computation of correlations and 
include treatment of a variety of methods, 

HorziwGER, KARL J.: Statistical Methods Jor Students in Education. Boston, 
Ginn and Company, 1928. Pp. viii and 372. 

Chapter IX, “Linear Correlation with Quantitative Series,” pp. 141-176, 
includes sections on the interpretation of the correlation coefficient and some 
uses of correlation in evaluating test material. 

Chapter X, “Non-Linear Correlation,” рр. 177-189, contains a fine illustration 
of the solution of this type of problem. 

Chapter XIV, “Further Methods of Correlation for Two Characters,” pp- 
256-282. Among the methods presented are: (1) biserial r, (2) coefficient of 
contingency, and (3) correlation from ranks. 

Chapter XV, “Partial and Multiple Correlation.” Problems involving three 
and four variables are illustrated. Solution by determinants is also shown. 

Ореш, C. W.: An Introduction to Educational Statistics. New York, Prentice- 
Hall, Inc., 1946. Pp. xiii and 269. 


Elementary Statistical Methods 295 


Chapters VII through XII offer a full discussion of the coefficient of correla- 
tion including several methods of computation, regression and interpretation. 
Tuurstone, L. L.: The Fundamentals of Statistics. New York, The Macmillan 
Company, 1928. Pp. xvi and 237. 
Chapters XXII and XXIII, рр. 187-225, explain the nature of correlation 
and the Pearson product-moment method of computing it. 
WALKER, HELEN M.: Elemenlary Statistical Methods, Chapters XII, XIII and 
XIV. New York, Henry Holt and Company, 1943. Pp. xxv and 368. 
These chapters will be found particularly valuable in explaining the meaning, 
calculation and interpretation of a coefficient of correlation. 


CHAPTER XV 
Elementary Graphical Methods: 


Some Properties and Uses 
of the Normal Curve 


It is not within the scope of this text to set forth anything but the 
most elementary graphical treatment of material so that the student 
who is dealing with measurement work in physical education may 
know how to proceed with a pictorial analysis of the data at hand. 


i 


HH 


ЕН 
ium 


H 


Y-Aus or Ordinate 


H 
i 


im 


Y=4 


MAE 


o X25 X-Axis or Abscissa 
Fig. 13. Showing coordinate axes and method of plotting any point as “Р.” 


Simple Algebraic Principles. The two basic lines used in 
plotting or graphing are called the coordinate axes and intersect 
each other at right angles at a point which is called the origin “0.” 

296 


Elementary Graphical Methods 297 


The vertical axis or “ordinate” is called the Y-axis and the horizontal 
the X-axis or “abscissa” (see Fig. 15). In order to plot any point 

P" whose coordinates are x=5, y=4, we proceed from the origin 
to the right five units on the X-axis and up from the origin four units 
on the Y-axis. Where perpendiculars erected at these points inter- 
sect will be located the desired point " P." In this way any point 
whose coordinate values are known can be located. 

Frequency Polygon. Let it be required to plot the data ol 
Table XVIII in a frequency polygon. The intervals of distance are 


TABLE XVIII 


Dara on STANDING BROAD Jump 
Entering College Freshmen Group 


i 5 Cumulative 
Distance Frequency frequency 
9'4” 9'6" 2 534 
91" -9'3" 4 682 
8/10"-9'0" 14 E 
87" 8'9" 26 514 
874" g6" 47 d 
81" —8'3” 66 441 
710"-8'0" 74 EO 
тит" 7'9" 67 501 
TA" 76" 79 254 
7 p 52 155 
6'10"-7/0" 32 S 
6/7" 6/9" 32 the 
6'4" -6'6" 19 5 
61” 63” 10 20 
5/10"-6/0" 5 10 
Бит" 5/9" 4 5 
5'4" SG” 0 | 
Буи g3" 1 : 


N = 554 
*Data collected to the nearest inch. 


laid off at regular intervals along the base line or X-axis from the 
аа. To conserve space а break is placed parallel with and just 
x the right of the Y-axis to show that intervals up to 5 feet, 1 inch 
to 5 feet, 5 inches have been omitted. The frequencies within each 
па are laid off on the scale along the Y-axis. In setting points 
d be joined for the frequency polygon it is important to remember 

at the frequency of any given interval is represented by its mid- 
Point. Thus the frequency for the interval 5 feet, 1 inch to 5 feet, 


298 Tools of Measurement 


3 inches is 1 so we go out on the X-axis to the midpoint of this 
interval (5 feet, 2 inches) and up 1 unit on the Y-axis. 

In this way all points are set and then joined by straight lines in 
regular order as indicated. The distances of the coordinate paper 
taken to represent the intervals will depend entirely on the paper 
used and the size of the interval. In frequency polygons the area 


Wo Tal 
L-] 


| 
x EN 
Poo [| 
: ва 


St 57 "ی ھی‎ oF c GG" ут 7° үа y" 8* B^ B" gl 9* 959% 
le in feet and inches 


Fig. 14. Frequency polygon plotted from data of Table XVIII. 


between the boundary line of the polygon and the base line repre- 
sents the total frequency (JV) of the distribution. 

Histogram. A histogram is constructed in much the same 
manner as a frequency polygon except that all of the scores of a 
given interval are not considered as concentrated at the midpoint 
but as spread out uniformly over the entire interval. On this account 


Frequency 
v SF FF 


ә 


Т EE & 77 7* 7? 7° 8 B8 GFT FS 
ULLA ее ой, Бе 0 


Fig. 15. Histogram for standing broad jump of 534 college men. U.C.L.A. 
data. 


Elementary Graphical Methods 299 


the number of cases in a given interval is represented by a rectangle, 
whose altitude is equal to the number of cases within the interval. 
In drawing a histogram it is not necessary to project the sides of 
these rectangles to the base line. The boundary lines will show the 
increase or decrease in the number of cases made in each interval. 


Seventh Grude 
Eighth Grade 
Ninth Grade 
Fath Grade 
Eleventh Grade 


РУСА Grade 


College Freshmen 


Алеш: ту о 10 20 Jo fe 50 60 P 80 


Fig. 16. Percentage of boys passing Athletic Badge Test. 


Prints 
Jump and Reach i 
Chin a ` 
Standing Broed Juro E 
4/00 yd. Dash 8 
Дева Throw for Ditance Zn 
I mie Kn e 
Long. Dive 189 


20 


70 25 
T-Score Points Abore the ear. 


f athletes over a normal group in 


Fig. 17 i n eriority о 
remo dee os attery of athletic ability tests. 


T-score units above the mean in a b 

Such a method of representation is valuable where rapidity of con- 
Struction and general impressions are desired. 
Column or Bar Diagram. Occasionally it is necessary to show 
Eraphically quantities which exhibit differences in one dimension. 
is can be done quite readily by the use of the column or bar dia- 
Bram, Two illustrations may be given, the first hypothetical, the 


Second actual, 


302 Tools of Measurement 


axis will indicate the score that only 25 per cent of the total group 
canmake. In Fig. 19, only 25 per cent could jump 8 feet, 214 inches. 

An Important Application of the Normal Curve. It has 
been previously pointed out that the area under any frequency 
polygon represents the total number of frequencies in the distribu- 
tion. Knowing the total area between the curve and the baseline 
and the proportion of this area in any given segment, it is possible 
to compute the chances that a score will fall in that segment. Tables 
have been devised by which we can very readily calculate such 
chances. Table XXXII in the Appendix has been taken from 
‘Tables for Statisticians and Biometricians,” edited by Karl 
Pearson, Cambridge University Press. This table gives various 
decimal portions of the total area under the normal curve between 
the mean and points at varying standard deviation units from the 
mean, that p — Sore __ . The column on the left of the 

S.D. of distribution 

table represents the standard deviation unit distance while the 
columns to the right represent percentage portions of the curve to 
be found between the mean and the particular point in question. 
To facilitate calculation the total area of the curve has been arbi- 
trarily taken as 10,000 cases. Suppose we wish to know the per- 
centage of cases falling outside of the point at 1.5 standard deviations 


above the mean. In the column(®)to the right of 1.5 we find the 


figures 4332. This figure being based upon 10,000 cases we know 
that 43.32 per cent of the cases in a normal distribution lie between 
the mean and 1.5 standard deviations above the mean. Since 50 per 
cent of the cases lie above the mean and 50 per cent below, we there- 
fore determine that 6.68 per cent of the cases lie above 1.5 sigma 
units. The chances that persons will make scores above 1.5 sigma 
units are therefore only about 7 in 100. Likewise, approximately 
7 individuals in 100 will score below minus 1.5 sigma units. 

A number of interesting problems may be solved by the use of 
Table XXXII, if we consider that the distribution under considera- 


tion approximates a normal curve. Some samples of such problems 


ffered. E [ 
аер Suppose we have a distribution of a group in the 


4 
илы for distance (using a 12-inch playground ball) where 


throw : “ 
bap sy 9h с = 26.1 feet. The question is asked, “How far 
M i а ве throw the ball to be classed in the upper 5 per cent?" 
mu 


Elementary Graphical Methods 303 


Since Table XXXII gives the percentage in a normal group at given 
distances (in terms of number of standard deviations) above and 
below the mean, a performance to be classed in the upper 5 per cent 
will be found at a sigma distance represented in the table by 45 per 
cent. In the body of the table we find 4495 and 4505, the sigma 
distances for which are 1.64 and 1.65 above the mean. By means of 
interpolation 45 per cent is exactly midway between or 1.645 sigma 
above the mean. Calculating the distance we then have 


M + (1.645 X S.D.) or 

170.2 + (1.645 X 26.1) = 213.1 
The statement may then be made that any boy who throws 215.1 
feet or more may be classed in the upper 5 per cent. 
{ То class a throw in the lower 18 per cent, the following computa- 
tions must be made: 

1. Lower 15 per cent is 55 per cent below the mean. 

2. The figures nearest to 35 per cent in the body of the table read 
3485 and 3508, or 1.03 and 1.04 standard deviations below 
the mean. 

3. Since 3500 is approximately 24 of the difference between the 
two, the additional decimal place would be .0067 or .007. 

4. The lower 15 per cent may then be calculated as 


M — (1.037 X S.D.) or 
170.2 — (1.057 X 26.1) = 145.1 
Any boy who throws 143.1 feet or less should be classed in the 
lower 15 per cent. 
. Problem 2. It is often desirable to know the limits which will 
include the middle 45, 50 or 60 per cent of the group. The middle 
per cent will be included between the mean and 22.5 per cent on 
each side of the mean. We find from Table XXXII that .6 sigma 
above and below the mean will include 22.57 per cent of the cases. 
T this figure so closely approximates 22.50, we may compute 
185 middle 45 per cent as 170.2 + (6 X 26.1) or 154.54 feet to 
-86 feet. 
| In the same manner, the limits for the middle 50 or 60 per cent of 
1€ group may be calculated. 
roblem 5. The student or teacher frequently wishes to know how 
de Or bad certain performances are in terms of percentage or in 
erms of the number of individuals out of 100 or 1000 who may 
normally be expected to perform as well or as poorly. 


304 Tools of Measurement 


In such cases, calculate the number of standard deviations 
between the mean and the performance in question. From this 
computation determine from Table XXXII the percentage of cases 
between the mean and the performance. Subtract this from 50 and 
thus obtain the upper or lower percentage desired. In the Standing 
Broad Jump for College Men where the M = 7 feet, 7.6 inches and 

= 8.5 inches, one man in the group jumps 9 feet, 3 inches. How 
good is this jump with respect to a percentage of the group? The 
jump is 19.4 inches or 2.28 с above the mean. Table XXXII tells 
us that 48.87 per cent of the cases are contained between the mean 
and 9 feet, 3 inches. The jump, therefore, is in the upper 1.13 per 
cent. This means that only 1.15 men in every 100 do as well or better 
or we may say 11.3 in 1000 or 113 in 10,000. 

Problem 4. Suppose that we have two groups of college men, 
Freshmen and Sophomores, whose times have been taken in the 
100-yard dash and we wish to know how significant is the difference 
between the average time made in each case. Suppose also that these 
groups are unselected and that the trait (speed in the 100-yard dash) 
is distributed normally.! 


Mean S.D. Mean No. cases 
Freshmen (1)....... 12.588 .750 .0590 155 
Sophomores (2)..........- 12.600 792 .0946 70 


Actual difference .012. 


The formula for calculating the reliability of the difference is as 
follows: 


epit. = V (ean)? + (cus)? 
where санг, 15 the standard error of the difference, см1 is the standard 
error of the mean of the first group and смт is standard error of the 
mean of second group to be measured. “D” stands for the actual 


difference, (.012). 
саш) = A//(0590)* + (0946) = 1115 
Diff. _ .012 108 


саш.) 41115 
rises as to how much actual difference should 
tee that the freshmen will always be better 


The question now а 
be required to guaran 
C , Frederick W. The Measurement of General Athletic 
we үү” Е o. 143. Eugene, University of Oregon Press, 1929. 
i * 


Elementary Graphical Methods 305 


than the sophomores, that is, to insure absolute reliability in the 
obtained difference. It appears at first glance that the freshmen are 
slightly better than the sophomores, but is this difference real or due 
to chance. Garrett? holds that it is usually customary to take a 
D "UP А 
гута of three as indicative of significant reliability, since + 5 sigma 
ifr.) 
includes practically all of the cases in the “distribution of differences" 


greater than three is to be taken as indi- 


below the mean. “A B 
c 


(diff.) 
cating just so much added reliability." 3 


Again referring to Table XXXII, it may be noted that a of 


C (dift.) 


.108 gives a value of a little less than .438. Now if equaled 0, 


C (dift.) 
we should say that the chances were 50-50 or even that the freshmen 
would be better than the sophomores or vice versa. But they are 
approximately 4 per cent ‘more. Hence the chances are 54 in 100 that 
the true difference is greater than zero. A difference of this sort 
then is not to be taken as significant. 


However, if the in the above case had been 2.87, we would 


c (dift. 

conclude that the ‘ne are 998 in 1000 that the true difference 
will always be greater than zero, а rather significant figure. 

Transmuting Judgment Scores into Scores on a Linear 
Scale. In dealing with certain types of problems in physical educa- 
tion it may be necessary to secure expert opinion on the value of a : 
given test as measuring a particular ability or an expert judgment 
оп an individual's ability along a certain line. To facilitate the 
calculation of correlation coefficients it may be desirable to trans- 
mute these opinions or judgments into scores on a linear scale, say 
of 100 points. Let us say that ten judges have been asked to rank a 
number of men in the quality known as GENERAL ATHLETIC 
ABILITY on the basis of one to ten, one representing superior ability 
and ten inferior ability. A procedure for a situation of this kind 
has been worked out by Hull* and further illustrated by Garrett. 


x Garrett, op. cit, 215-215. 

id., p. 215. i 
*Hull, Clark, “The Computation of the Pearson r from Ranked Data," Jr. of 
s Pplied Psychol., VI (1922), 385. 

Garrett, Н. E., op. cik, pp- 161-175. 


306 Tools of Measurement 
TABLE XIX 


йе Aver- Score 

Сры аде Рег сепі Hale 
Tt m [mfy] v | vr [viiv ix | x | rank | position | ‘00 

А [о | 8/10] 8/10/9] 8/10] 7| 9| ss} 85 | 3l 

B 1 d 2 1 21:2 2 1 Т 2| 1.5 10 75 

© |6| 5] 5/ 51 6/4] 6| 5| B| 4| 5.3| 46 | 82 

D|3|4|3|2|4|[3|4|2|3|3|31/| 26 | 63 

E 8:8/7/9|9|8|8|9| 7| 7| B0| 75 | 37 

F |9 |1010) 10| 9|9 |10| 9/10/10] 9:6 | 91. | 24 

Ө? 6) 7-7) 6) 6) zi isl 7| 8|: Е 

EAI 2) 2) л! 1)2) 2 a) ul ЕЕ 

І | 4] 3] 3] зу 2) 4] 3] 4] Z| 3| 3] 26 | 65 
J |7 | б| 6| 7] 5/6] 6| 6| 6| 8| 60| 55 | 48 

K 6 7 Ж 7 8| 8 8 7 8 7 | ^3 68 41 

L 6 6 5 5 6| 4 6 5 5 5 | 5.3 48 51 

м5 | 4| 4| 6] 5l 5| 5| 6| 4| 5|49| 44 | 55 

м |6| 4| 4| 6| 6|5 | 4] 6] 5] 6| 50| 45 | 52 


Hull's formula for determining per cent position is as follows: 


Per cent position — ES 


R = the rank in order of merit; in this case the average rank. 
N = the number of ranks. 


The formula was originally devised so that a series of individuals 
could be ranked in order of merit from one up to the number of 
individuals in the group, but since there are only 10 ranks, V should 


not be more than 10. 
Taking student A (see Table XIX): 


3 88 —.5 
Per cent position — Шав = 85 


From Hull's Table (Table XXXIII in the Appendix) this indi- 
vidual's score is 5.1 or 31. In like manner the per cent position and 
score of all the remaining students may be computed and is shown 
in the judgment table. Garrett, 8 offers a good explanation of Table 


XXXIII. 
6Garrett, Н. E., ор. cit., pp. 172-173. 


Elementary Graphical Methods 307 


“This table represents a normal frequency distribution which has 
been cut off at + 2.50. The baseline of the curve is 5c therefore, 
and may conveniently be subdivided into 100 parts, each .05c. The 
first .05c from the upper extreme limit of the curve takes in .09 of 1 
per cent of the distribution and is scored 99 on a scale of 100. The 
next .05е (.10 from the upper end of the curve) takes in .20 of 1 per 
cent of the entire distribution and is scored 98. In each case, the 
per cent position gives the fractional part of the normal distribution 
which lies to the right of the given ¢ position on the base line. The 
-values determine the transmuted score.” 

Using the values shown in Table XXXII, it is not a difficult 
matter to set up a table on the order of the one computed by Hull 
but taking in a range of three standard deviations on each side of 
the mean, The baseline will be бе and each score point may be 
represented by .06s. (See Table XXXIV in the Appendix.) 


Selected References 


GARRETT, Henry E.: Statistics in Psychology and Education, Second Edition, 
Chapters IV, V, and VI, pp. 62-157. New York, Longmans, Green and Com- 
pany, 1945. Pp. xiv and 493. 

These chapters discuss methods of plotting measures which have been 
Erouped into a frequency distribution as well as the properties and uses of the 
ormal Probability Curve. 

Horzincer, Kart J.: Statistical Methods for Students in Education. Boston, 

inn and Company, 1928. Pp. viii and 372. 
„ Chapter III, pp. 31-46, contains an excellent tabular and graphical presenta- 
ton of data. 
Chapter XII, pp. 204-230. This discussion of "The Normal Probability 
Urve" is more advanced than is found in Garrett. The following topics not 
Ordinarily found in an elementary text are discussed briefly: (1) The Equation 
of the Normal Probability Curve, (2) The Area, Ordinates, and Deviates of the 
ormal Curve, (3) Comparison of the Point Binomial and the Normal Curve, 
and (4) Fitting a Normal Curve to a Frequency Distribution. 

Калку, Truman L.: Statistical Method. New York, The Macmillan Company, 

1924. Pp. xi and 390. 


Chapter II, "Graphic Methods.” Of special importance in this chapter are 
the sections on smoothing data, the growth curve, and the principles underlying 


i portrayal. di he derivati 

hapter V, “The Normal Probability Distribution," discusses the derivation 

of the equation ist ascia distribution and the use of the Kelley-Wood Table 
™ à variety of problems. 

Morey, Ruvoten: How To Use Pictorial Statistics. New York, Harper and 
Brothers, 1937. Рр. xvi and 170. 


308 Tools of Measurement 


Explains the value of pictorial statistics, the techniques underlying the con- 
struction and use of pictorial charts, and includes a chapter on other graphical 
methods. This book should increase the student’s understanding for interpreting 
such tables common to educational literature, and provide suggestions for their 
use in the physical education program. 


Ореш, C. W.: Zn Introduction to Educational Statistics. New York, Prentice- 
Hall, Inc., 1946. Pp. xiii and 269. 
Chapter III, “Graphic Presentation of Frequency Distributions,” illustrates 
a variety of methods of graphical presentation. 


SCHORLING, RALEIGH; CLARK, JOHN R. and LANKFORD, Francis G.: Slalistics, 
Collecting, Organizing, and Interpreting Data, Problem II, pp. 18-43. New York, 
World Book Company, 1943. Pp. iv and 76. 

Presents procedures in organizing and presenting data in a simple, under- 
standable form, suitable for the beginning student. 


Tuurstone, L. L.: The Fundamentals of Statistics. New York, The Macmillan 
Company, 1928. Pp. xvi and 237. 

Chapter VI, “Smoothing the Frequency Polygon,” рр. 39-46, is a discussion 
not ordinarily found in an elementary text and will prove valuable to the 
statistical student. 

Chapters XVII, XVIII and XIX, pp. 126-154, involve concepts regarding 
the normal probability curve and will lead to a more complete understanding 
of its properties. 


CHAPTER XVI 


Methods of Scoring Tests 


Early workers in physical education measurement presented to the 
profession a large mass of test material unsuited for use today. 
Correct statistical procedures were not followed, scoring devices 
were set up without regard to the actual performances of students, 
the meaning of given score values was not clear and equating of 
score values in various tests was impossible. 

In setting up a scoring device for any type of data collected in the 
field several very important points must be given consideration. 

1. The purpose for which the scale is to be used is perhaps the 
first problem. For example, a scale to be used for classification 
Purposes may be one in which only large groups or classes are desired. 

hus a very rough scoring device which specifies the following groups 
may be set up: 


Superi t, ш. кз eee rrt tea Re XI SE Кез owa Upper 5 percent 
Aboveaverage..... t] Next 2216 per cent 
AYETA BO. waia a rtis A E SSDS ire nde Middle 45 per cent 
Below average...... e Next 2214 per cent 


Infertor..... cea ras rin REI ace vie Lower 5 percent 


This type of scale will not be especially helpful in giving to the 
Student an idea of his progress from time to time throughout the 
Year or in serving as a motivating device. The divisions are too 
large and require too great a change in the student's ability to be 
Doted on the scale. A scale for showing progress from time to time 
must have many divisions, divisions so small that a slight improve- 
ment will be rewarded by a larger score. 
309 


310 Tools of Measurement 


2. The range of performance ability covered by the scale is also 
highly important. In scales used to show progress and improve- 
ment, the highest scale score should be set at a point such that very 
few performances will ever exceed that score. Likewise, the lowest 
scale score should be set in such a way as to include practically all 
of the poorer performances. The range of performance may be a 
controversial point, but the fact still remains that an extremely 
wide range, such as five standard deviations above and below the 
mean, tends to eliminate the higher and lower scores from use. A 
range of three standard deviations on each side of the mean appears 
to be as logical a choice as any since, under normal conditions, only 
three people out of 1000 fail to come within the scoring range. 

3. The type of scale is also an important issue. McCloy has 
pointed out that an even-step interval plan of scoring is questionable 
except with homogeneous groups and that for heterogeneous groups 
an increased-increment plan seems advisable.! In other words, if a 
scale is to be constructed in a particular event for boys of the same 
age-height-weight classification group, an even-step interval plan of 
scoring may be used with safety. If, however, the group under 
consideration includes boys of both the Junior and Senior High 
School, an encreased-increment plan of scoring is the logical choice. 

4. The data from which the scale is to be constructed must be 
analyzed. If data are badly skewed, changes in the form of the test 
or in the scoring may be necessary. It may be that the test is too 
difficult for the particular group in question and, if so, a large 
number of performances of zero ability will be recorded. Test pro- 
cedure should be revised and there should be set up a means of more 
adequately measuring performance in the lower ranges of ability. 
Correct test procedures will usually produce a distribution which 


approaches normality. T. | 
The following methods embrace most of the possibilities of scoring 


hysical education test data. | | | 
, Pass or Fail. This method of scoring provides only a single 


tandard of performance which the student must attain. It is 
-tremely simple to set up but care must be taken in designating 
ae! dard. The old Athletic Badge Tests used 


ing of the stan 1 
ra pee of scoring but failed to set forth the meaning of the 


tandard. For example, in the first test for boys, a standard " 

sta: я 2 i 

1McCloy, Chas. H., The Measurement of Athletic Power, p. 11. New York, А. 
Barnes and Company, 1932. 


Methods of Scoring Tests 311 


feet 9 inches was set for the Standing Broad Jump. To give meaning 
to such a standard, a qualifying statement should have been made 
telling the profession that this was a performance attained by 50, 60 
or 70 per cent of twelve-year-old boys. ' 

To illustrate the method to be used in setting up standards of 
performance, the following data are available: 


Sentor Нісн SCHOOL GIRLS 


Event Mean Standard deviation 
Basketball throw—distance........- -> 45.56 feet 7.84 feet 
GO-yard dash....... esee 9.35 seconds .85 seconds 
Jump and reach............. ett 12.05 inches 2.38 inches 


Let us assume that the standard set is such that two-thirds of 
the girls will pass. It should be understood that any standard set 
is purely an arbitrary matter but, if designated, it provides meaning 
to the scale. 

By referring to Table XXXII, we find that the point where two- 
thirds of the girls will pass lies at 0.430 below the mean. By making 
the proper computations, the following standards may be set: 

Basketball throw for Чїббапсе........++ trett 40.19 feet 


60-yard dash......... nn Я 9.7 seconds 
Jump and reach. cers ae EE Ee Om ea ETA 11.03 inches 


In actual practice in the field, the standards set will probably be 

designated as 40 feet, 9.7 seconds and 11 inches because the theo- 
retical computations are so close to these figures. It should also be 
noted that such standards provide equivalent performances in the 
three events in question. 
‚ Another illustration in the pass or fail type of scoring is offered 
in the Philadelphia Age Aim Charts. Here an "age aim" un- 
doubtedly refers to the performance level at the average for any 
given group. When it is stated that the “age aim” for twelve-year- 
old boys in the 50-yard dash is eight and one-fifth seconds, that 
Statement is equivalent to saying that the average performance of 
twelve-year-old boys in the 50-yard dash in Philadelphia is eight 
and one-fifth seconds. 

Success or Failure. When the test material is in the nature of 
Stunts, illustrated by Brace? in his scale of Motor Ability Tests, 


2 
Brace, David K., Measuring Motor Ability. New York, A. S. Barnes and Com- 
Pany, 1927. 


312 Tools of Measurement 


the success or failure method may be used to advantage. Here the 
test material does not lend itself to gradation by an objective 
method. The student does the stunt or fails to do it. In principle 
this method differs but little from the pass or fail method, except 
that in the latter a notation may be made of the individual’s actual 
time, distance or height. Both of these methods fail to show the 
spread of ability within the given event or stunt and will therefore 
have the disadvantage of lacking motivating power. 

Minimum Standards with Additional Point Awards for 
Better Performances. This type of scoring device, though quite 
uncommon at the present time, was one of the earliest methods used 
in Physical Education and was reported in 1894 by the Gymnastic 
Societies of Cleveland. 3 

As an example of the method used in formulating a scale for the 
16-pound shot-put, 1 point was given for a minimum put of 18 feet, 
and 1 point for each additional foot. In all probability little or no 
statistical consideration was given to equating performances in 
various events or increments of performance. : 

Reilly, in his program of Rational Athletics for Boys and Girls, 
also used this method of scoring.* For example, in the 40-yard dash 
for Juniors (Class A), he offers 5 points for nine seconds and 2 points 
for each one-fifth second under the standard. 

As in the pass or fail method, minimum standards must here be 
set up in accordance with recognized statistical procedures, that 
is the minimum standard must represent the same relative perform- 
ance level in all events. Then too, the increments for additional 
point awards in all events must be relatively the same. 

Referring to the Senior High School Girls’ data shown in connec- 
tion with the pass or fail method, let us assume that it was desired 
to set the minimum standard at a performance level where 90 per 
cent of the girls could pass and would be awarded 1 point. Again 
using Table XXXII our minimum standards will be: 


dist: 33.53 feet 
Des d os MEME 041 sec 
Jump and reach... («eei se Daf E fe A E 9.00 inches 


o measure more accurately in the 60-yard 
tenth second, we shall probably be 


tober, 1894), 7-10. 
Vol. I, No. 8 (October, T 304-105 


Since it is impractical t 
dash than to the nearest one- 


“i tathlon,” Mind and Body, ] | 
о per J., New Rational Athletics for Boys and Girls, р 


New York, D. C. Heath and Company, 1917. 


Methods of Scoring Tests 313 


forced to reduce the increases in the other two events to terms com- 
parable with one-tenth second in the dash. Now one-tenth second 
in the dash is part of a standard deviation or .12¢. The following 
approximate equivalents are then available: 


Minimum 3 
AH НИН Increase 1 point 


Event standard for energ 
90 per cent pass 
Basketball throw—distance... +++ +++ ++ 55.5 feet 11 inches 
60-yard dash. ss ea 777 10.4 seconds My second 
Jump and reach... | 9.0 inches 3% inch 


Division into Classes or Groups. It is often desirable in 
physical education to divide groups into classes as A, B, C, etc.; ог 
high, average and low; or superior, above average, average, below 
average, and inferior. In cases of this sort the division should be 
made on the basis of a recognized statistical procedure. By obtaining 
either the mean and standard deviation or the median and upper and 
lower quartiles, any of the above classes may be set definitely as 
to limits. 

d For example, we may take the distribution of high school girls 
in the Basketball Throw for Distance (see Table XII) and work out 


several sorts of class groups. 


М = 43.560 с = 7.836 
Ма = 43.986 Q, = 38.109 Q; = 48.806 


Suppose we wish to divide girls into three classes, high being the 
upper 25 per cent, average the middle 50 per cent and low the lower 
25 per cent. Using the upper and lower quartiles (Qs апа О) we have, 


High = 48 feet 10 inches and up 
Average — 38 feet 1 inch to 48 feet 9 inches 
Low = below 38 feet 0 inches 


These classes could be designated as A, В, С, or 1,2,5. 

Suppose we wish to form five classes. Limiting the range of ability 
to three sigma on each side of the mean, each class will be repre- 
sented by six-fifths of a standard deviation. The middle group or 
average class will range from 0.6c below the mean to 0.60 above 
the mean. 


314 Tools of Measurement 


„бс = 4.702 = 4 ft. 8 in. 
6c + 1.20 = 1.87 = 14.105 = 14 ft. 1 іп. 


Lower limit of class ‘‘above average” = 


M+ 60 = 43.56 + 4.702 = 48 ft. 5 in. 
Lower limit of class “superior” = 
М +180 = 43.56 + 14.105 = 57 ft. 8 in. 


Lower limit of class "average" = 


M— 6c = 43.56 — 4.702 = 38 ft. 10 in. 


Lower limit of class “below average" = 


M — 1.80 = 45.56 — 14.105 = 29 ft. 5 in. 
Our classes would then run as follows: 


Superior = 57 feet 8 inches and up 
Above Average = 48 feet 3 inches to 57 feet 7 inches 
Average = 38 feet 10 inches to 48 feet 2 inches 
Below Average = 29 feet 5 inches to 38 feet 9 inches 
Inferior = 29 feet 4 inches and under 


A scheme of giving points in the various ranges might be worked 
out by allowing 1 point for an inferior performance, 2 for a per- 
formance below average and so on. 

As many classes can be arranged as seem desirable, and while the 
division ought usually to cover a range of three standard deviations 
on each side of the mean, this is by no means a hard and fast rule. 
A range of five sigma could be used with a division of three or five 
classes, and so on according to the situation. 

Standard Scores or Measures. Kelley? and Holzinger? have 
both used scores which are useful for comparison of measurements 
on unlike scales. These scores are abstract numbers derived by 
dividing the deviations from the mean by the respective standard 


deviations. Thus, 
Xs — Me 


92 


SMa = = A! and SMa = 
1 


in which S.M. = Standard measure 
X = Score on a test 
M = Mean 
с = Standard deviation 
5Kelley, Truman L., Statistical Method. New York, The Macmillan Company 


1924. 


6Holzinger, Karl J., ор. сй. 


Methods of Scoring Tests | 315 


Thus, if we have a number of different tests in original scores in 
which the means and standard deviations are unlike, we may reduce 
the scores on all tests to comparable measures and add them to 
obtain a composite figure for comparison with other individuals. 
To illustrate, the following table is presented. 


TABLE XX 
STANDARD Scores OF A COLLEGE MAN ON SEVERAL ATHLETIC ABILITY Tests 


pm S.M. = 
Test Mean S.D. scores of хХ- М XA 
student c 

Ds lib. e 7.78 527 6 —1.78 —.545 
2. Shot-put........- 28.07' 4.25' 29'2" +110 | 4-259 
3. 100-yard dash.... 12.602* ‚754% 12.9 —.298 | —.406 
4. Вагвпар........ 61.3” 14.5" pa" 412.7 +.876 
5. Jump and reach. . 18.75" 2.96" 20.5" 41.75 | +591 
6. Long Чфуе....... 871.9” 28.15” 8/4" +21 +.075 
7. Half-mile run....| 2747.9 15.6 27525 —4.5 =.331 
Total standard score (or measure) +.519 


The T-score. An extremely convenient score to use in the 
scaling of tests is one made popular by McCall which he calls the 
T-scale. T-scores are based on the standard deviation of the dis- 
tribution in question and range from 0 to 100. The zero point on 
the scale is taken as the performance at 5e below the mean and the 
upper limit (100 points) at 5c above the mean. One ^ T"-score 
represents 0.10, the mean is 50 and each 10 points above or below 
the mean is le. McCall's original scale was worked out on unselected 
twelve-year-olds but because of the fact that standard measures are 
comparable within the same group, T-scores may be used to apply 
to any homogeneous group. They may be derived in one of two 
ways: 

(а) T (score) = 50 + noc 

(6) By a nontechnical method described by McCall in which, 

Column I will represent the distribution intervals; 

Column II will represent the frequencies; | 

Column III will represent the number of scores exceeding a given 
performance; ү 
_ Column IV will represent half of those making scores in a given 
Interval; and 


316 Tools of Measurement 


Column V will represent the per cent of those exceeding (Column 
III) plus half of those reaching a particular interval (Column IV) 
tound by dividing Column III plus IV by JV (the number of cases 
in the distribution). 

The scale will be read from Column V by a table? showing the 
S.D. distance of a given per cent above zero. Each S.D. value is 
multiplied by 10 to eliminate decimals with the zero point at бе 
below the mean. 

The mean and standard deviation of the 100-yard dash data used 
to illustrate the methods employed in (a) and (6) are: 


M = 12.602 
o= .734 
TABLE XXI 


Records of Unselected College Freshmen and Sophomores in the 100-yard dash 
without Practice and Instruction 


(а) (b) 
7 of each Column 
Column I interval | T(score) | Column | Column v osteo 
times | Column Geni 50 + ш No. уд) үү | Сеше 
seconds II mean lO(X-M) | xcee К woking ie | BE. 
SEE £ х-м = жаса each Hx | 516% 
Pies z score 100 
110 1 +2.18 72 о .5 .224 78 
111 2 +1.91 69 1 1.0 1897 74 
112 9 +1.64 66 3 4.5 3.36 68 
113 17 +1.37 64 12 8.5 9.19 63 
114 11 +1.09 61 29 5.5 15.47 60 
120 13 + .82 58 40 6.5 20.85 58 
121 19 + .85 56 53 9.5 28.00 56 
122 33 + .28 53 72 16.5 39.70 53 
123 30 .00 50 105 15.0 53.80 49 
124 27 = 527 47 135 13.5 66.60 46 
139 14 = .54 45 162 7.0 75.80 43 
13! 13 — ,82 42 176 6.5 81.90 41 
13? AL —1.09 39 189 5.5 87.25 39 
133 7 —1.36 36 200 3.5 91.25 36 
134 3 —1.63 34 207 1.5 93.55 35 
140 1 —1.90 31 210 5 94.50 34 
141 6 —2.18 28 211 3.0 96.55 32 
14? 3 —2.48 25 217 1.5 98.10 29 
143 1 —2.72 23 220 Ss 98.88 27 
144 1 —2.99 20 221 5 99.33 25 
150 о —3.27 17 222 [е] 99.55 24 
151 о —3.54 15 222 о 99.55 24 
15? 1 —3.81 12 222 5 99.77 22 


*This table assumes a normal distribution of performance and in proportion as the distribution 
on approaches normality the T-scores in (a) and (5) will agree. It will be noted that this 


5 i$ tail” = 
EM distribution is skewed negatively, that is, having the "tail" of the curve on the left. This 
Кайа to increase S.D. and hence have a very great influence on the T-score values in (a). Near 


the Mean the T-scores obtained by the two methods agree rather closely. 


7See Table XXXV in the Appendix. 


Й. 296 


T 


Methods of Scoring Tests 317 


Even-step Interval. As has been previously pointed out this 
plan of scoring presupposes a homogeneous group. As in the T-score 
(Plan a), the range of scoring is divided into an equal number of 
parts (usually 100) and also is limited to a given number of standard 
deviations above and below the mean. In the case of a range of 
100 scores, a score of 50 equals the performance at the mean, a score 
of 100 at the range set above the mean (as for example at Зо). 

Though а number of studies? have logically used a range of three 
standard deviations on each side of the mean, it is not necessary to 
adhere strictly to this figure. Any reasonable range may be used, 
but as the range approaches 5 sigma on each side of the mean, it will 
be found that scores above and below 20 will be used rarely, if ever. 
In such instances the range is not one of 100 scores but of 60. One 
of the objections to the use of the T-score is just this. However, 
the T-score has the advantage of ready calculation of a performance 
levelin terms of standard deviation value above and below the mean. 
For example, as soon as a score of 65 is mentioned, immediately we 
know that the performance in question is 1.5 standard deviations 
above the mean. 

In setting up an even-step interval scoring plan with a range of 100 
Scores and of three standard deviations on each side of the mean, 
divide the range in sigmas by 100, find this increment in terms of 
the distribution and add or subtract it from the mean for each score 
desired. The following data on the 75-yard dash for eighth grade 
boys will serve to illustrate the procedure: 


M = 11.36 seconds 


с = .966 seconds 
іп s 6 X .966 Ў 
Increment = Bangs 2 seconds = Do .05796 
Range in scores 100 


On the finished scale only the time to the nearest one-tenth second 
need be shown. However, for purposes of accurate computation, 


5Neilson, N. P. and Cozens, Frederick W., Achievement Scales in Physical 
Education Activities for Boys and Girls of the Elementary and Junior High School. 
New York, A. S. Barnes and Company, 1954. е 
Cozens, Frederick W., Trieb, Martin Н. and Neilson, N. P., Physical Education 
Achievement Scales for Boys in Secondary Schools. New York, A. S. Barnes 
and Company, 1936. AS 
Cozens, Frederick W., Achievement Scales in Physical Education Activities for 
College Men. Philadelphia, Lea and Febiger, 1936. н 
Cozens, Frederick W., Cubberley, Hazel J. and Neilson, N. P., Achievement 
Scales in Physical Education Activities for Secondary School Girls and College 
Women. New York, A. S. Barnes and Company, 1937. 


12 


318 Tools of Measurement 


TABLE XXII 
Even-step INTERVAL SCORING PLAN 


-— | Ti || „| e 
Yo second Yo second 

60 etc. 10.7804 10.8 50 11.360 

59 10.838 X 49 11.418 11.4 

58 10.896 10.9 48 11.476 11.5 

57 10.954 n 47 11.534 

56 11.012 11.0 46 11.592 11.6 

55 11.070 EM 45 11.650 

54 11.128 11.1 44 11.708 11.7 

53 11.186 1.3. 45 11.766 

52 11.244 ede 42 | | 1.824 11.8 

51 11.302 11.5 41 etc. 11.882 11.9 


the theoretical times must be kept as shown in order that cumulative 
errors may be avoided. 

Increased Increment. The principle upon which the increased- 
increment plan of scoring is built involves a consideration of the 
general idea that an increasing number of points for performance 
increases (such as 1 foot, 1 inch or one-tenth second) should be 
rded as performances become more difficult. Two previous 
9 indicate that the equation to be used in the computation of 
scoring tables for an increased-increment plan of scoring is parabolic 
in nature and approaches the formula Y = КХ2. These studies 
also emphasize the point that the increased-increment plan of scoring 
o heterogeneous groups. McCloy has pointed out that the 
be used in diflerent events varies between 1 and 3 


awa 
studies 


applies t 
exponent to 


9McCloy, Charles H., The Measurement of Athletic Power, 19-37. New York, 


and Company, 1932. А k . 
A e Feder WI “A Curve for Devising Scoring Tables in Physical 


Education,” Res. Quart. Am. Phys. Educ. Assoc., Vol. U, No. 4 (December, 
1931), 67-75. 


Methods of Scoring Tests 319 


according to the event in question. However, the general average 
of exponents when types of events are taken into consideration 
(that is, runs, hurdles, jumps, vaults, weight events, throws and 
climbs) will be close to 2, or Y = X2. While the formula Y = KX? 
(where K is any constant) may not be the exact one to use in any 
given event, it is undoubtedly close to a best-fit for all types of events 
and the computations can be made simple by the use of a slide rule 
and a table of squares. 

McCloy !° has prepared tables for a wide range of events showing 
a progressive increase in the number of points to be added for each 
increase in performance. In computing formulas for each event 
the -relative horsepower developed was given consideration. The 
scoring tables have a range of 1000 points with the approximate 
world’s record set at 900, and assume that field events begin at 
zero performance and track events at zero velocity. The zero point 
on the scale, however, is set above this zero performance an 
velocity and will include the poorest performance of nine- and ten- 
year-old boys within the limits of the scale. All careful students 
of physical education measurement should familiarize themselves 
with McCloy’s method of approach to the problem of increased- 
tncrement scales. 

Figure 20 presents graphically an illustration of the point previ- 
ously made regarding the use of equal-step interval and increased- 
increment scoring plans. It will be noted that the portions of the 
curve used in an equal-step interval plan for a homogeneous group, 
such as the portions A-B, C-D, and E-F, approximate a straight 
line and that therefore, whenever homogeneous groups are selected, 
equal-step interval plans of scoring are valid. This figure also illus- 
trates the fundamental principles regarding the method here pre- 
sented of computing increased-increment scoring plans. Though the 
origin of the curve is always located at бе below the mean, the 
Portion of the curve used for scoring purposes depends entirely upon 
the problem at hand. 

_ In order to illustrate this method of computation for increased- 
‘ncrement scoring plans, three types of problems will be outlined. 

he principles to be kept in mind throughout all computations are 
as follows: 

l. That with the data at hand (from a heterogeneous group), the 
Mean and standard deviation must be computed. 


!* McCloy, Charles H., op. cit. 


320 Tools of Measurement 


2. That the origin of the curve, Y = KX?, regardless of where 
the scoring scheme starts or where the maximum point awards are 
given, should be at 5 standard deviations below the mean. This 
starting point was selected because performances below this point 
are practically nil. Theoretically, only three persons in 10,000,000 
will ever perform at M. — 5e or below. 


н j - 
9 | — 
ai, E Ly Worlds Record 
E / Problem Ш 
is =) TH 
Р Р.У 
к) отті" 
ee = (2а Паза — | T 
E » 
M — таны 
3, | ыы Lt vats 
W| A920. Shoring of Vacreaxd- K rojn|-30 to p30- 
$ | озуле Seale fe буш КАЗ“ „2 рит 
s^ 9. E ==, T 
à 2 | 
== = ZW [ ile 
sd, = A اا‎ B 
= E 
Хай points s = 5 к у= j 
FW Gu ES ecu Monum. 
gn o; 2 тый Redd 
A Eile lade 
Fig. 20. 


3. Points on the X-axis must be of a nature which can be used for 
any event in question, that is, these points must mean the same 
thing relatively for any event in question. One of the very good 
measures which has this meaning is standard deviation, and standard 
deviations have therefore been chosen for X values. 

4. X and Y values are shown in Table XXIII. 

5. The value K may be any quantity depending entirely upon the 
point value which the scale maker may wish to set at a particular 
performance level, as for example, at 214 or 5 S.D. above the mean. 
With these principles in mind it now remains {о illustrate the 
pes of problems to be solved. : 
Problem 1. To set up an increased-increment scoring plan with 0 
points at the origin ( — 5 sigma) and 1000 points at 314 sigma above 


ty 


the mean. 


Methods of Scoring Tests 321 


TABLE XXIII 
VALUES ron THE COMPUTATION OF INCREASED-INCREMENT SCORING TABLES 


X at X value equals Since Y= KX* 
Y then equals 
—5 S.D. 0 0 
—4 S.D. 1 1K 
—5 S.D. 2 4K 
—2 S.D. 5 9K 
—1 S.D. 4 16K 
Mean 5 25 K 
+1 S.D. 6 36 K 
+2 S.D. 7 49K 
+5 S.D. 8 64K 
+4 S.D. 9 81K 
+5 S.D. 10 100 K 


Data for 50-yard dash—Elementary, Junior, and Senior High 
School Boys: 
M = 7.84 
S.D. = 0.69* 


Since the performance level at 516 sigma equals 1000 points, а 
value may be computed for K from the equation Y -KX?. Xin 
this particular instance is 8.5, since the origin is at 5 sigma below 
the mean and to this must be added the performance level above the 
mean (324 c). 

Substituting, 1000 K(8.5)? or 
K = 13.84 


\ 


By determining the value of 31% sigma, we can then tell what 
even tenth is just below 1000 points, In this instance 5.5 is the first 
performance for a point value just under 514 sigma. : 

Next determine how far in sigma value 5.5 seconds is above the 


2 P 2. : 
mean (2.34 seconds). This computation gives us 252 or 5.39 sigma. 


Hence the X value for 5.5 seconds is 5 + 3.39 or 8.39. 
Substituting in the equation for this problem, we have, 
Point value for 5.5 seconds — 13.84 X (8.39)? 
= 974 


_A table of squares and a slide rule make these operations very 
simple. 


322 Tools of Measurement 


In like manner, point values for all even tenth seconds over the 
entire range may be computed. 

To simplify the computations, X may be found more easily each 
time by subtracting the increment in S.D. value for one-tenth 
second. 

0.1 - 

S.D. value for И second = 069 = 0-145 

Point value for 5.6 seconds = 15.84 X (8.59 — 0.145)? = 941 

Point value for 5.7 seconds — 13.84 X (8.245 — 0.145)? — 908 

Point value for 5.8 seconds = 13.84 X (8.10 — 0.145)? = 876 

éte; 

Problem 2.11 To set up an increased-increment scoring plan with 
0 point at ( — 3 sigma) and 1000 points at ( + 3 sigma). 

X value at ( — 3 sigma) = 2 
X value at (+ 3 sigma) = 8 

Here we have a situation involving algebraic equations and to solve 

the equations we must have a factor such as S which will represent 

the quantity to be subtracted from the value of Y because of the 
fact that no scoring occurs below ( — 3 sigma). 

The general formula will be Y = KX? — S. 

Since we are limiting the scoring range at both ends of the scale, 
we shall have two equations for the solution of K and S, the two 
unknown quantities: 

At ( — 3 sigma), where X = 2, we have Y= 4K —S=0 

At ( + 3 sigma), where X = 8, we have Y = 64K — S = 1000 

Solving for S and K, we have S = 66.67, К = 16.67.12 

Using the 50-yard dash data originally cited, we have in the case 
of the range of M+: 3 sigma, 

Point value for 5.8 seconds = [(7.96 )? x 16.67] — 66.67 = 990 

Point value for 5.9 seconds = [(7.815)? X 16.67] — 66.67 = 951 

When computing point values for a scoring range of M + 214 
sigma, 

11[n connection with this problem it should be noted in Fig. 20 that, regardless of 
the fact that the curve starts at —5c, only that portion of the curve lying 
between the lines PP and P'P' is used for scoring purposes. These lines cut the 
curve at Зс below and above the mean respectively. 

12Jf it is desired to limit the scoring range of 1000 points to 214 sigma on each side 


of the mean, our mm eri de ZH 
Y = 56.25K — S = 1000 


in which we find S — 125 and K — 20. 


Methods of Scoring Tests 323 


Point value for 6.2 seconds = [(7.375)2 X 20] — 125 = 963 
Point value for 6.5 seconds = [(7.23)? X 20] — 125 = 920 
Problem 5. It is often desired to have an éncreased-increment 
scoring plan such that 1000 points equals the world’s record in any 
given event. In such a situation it is necessary to establish some 
level of performance for 0 points in order to set up two equations. 
This is the type of scoring plan which should be used by the A.A.U. 
in setting up their scoring tables for the Pentathlon and Decathlon. 


For the 50-yard dash data we would have 
Approximate World's Record = 5.2? 
М = 7.84° 
с = 0.69? 


World's Record = 3.83 sigma above mean, X value = 8.85. 

The selection of a performance level for 0 points is entirely an 
arbitrary matter, but it should be set at the same position for every 
event, and at such a level that it can be attained by practically all 
Performers in the group from which data are obtained. For the sake 
of illustrating the problem, the zero score value has been chosen at 
( — 216 sigma). 

The two equations then necessary are: 

Y = 1000 = (8852 K— S 
Y = 0 = (2.5)2К – S 
.77.97K — S = 1000 


6.25K -S= 0 
K = 13.94 
S = 87.12 


Computations can then be made as follows: 
Point value for 5.2 seconds = 1000 (set by premise) 
Point value for 5.3 seconds = [(8.64 )? X 13.94] — 87.12 = 954 
Point value for 5.4 seconds = [(8.536)? X 13.94] — 87.12 = 929 
Point value for 5.5 seconds = [(8.39 )? X 13.94] — 87.12 = 894 


Point value for 8.5 seconds = [(5 — 0.957)? X 13.94] — 87.12 = 141 
Point value for 8.6 seconds = [5-11 )?xX 13.94] — 87.12 = 125 


Point value for 9.4 seconds = [(5 — 2.26 )? X 13.94] — 87.12 = 18 
Point value for 9.5 seconds = [(5 — 2.41 )? x 13.94] — 87.12 = 6 


324 Tools of Measurement 


An X value for any particular point is found by obtaining its 
distance from the mean in standard deviations, adding this to 5 if 
the performance is above the mean, or, subtracting it from 5 if the 
performance is below the mean. 

It is interesting to note that, in the particular problem at hand, 
a decrease of one-tenth second at the lower end of the scale brings an 
increase of 12 points and that this point value gradually mounts 
with each one-tenth second decrease until at the upper end we have 
an increase per tenth of 25 points. 

Scoring tables for all events set up ёп the above manner must oi 
necessity have the same point value representing equivalent performance 
values. 


Selected References 


Garrett, Henry E.: Statistics in Psychology and Education, Second Edition, 
pp. 143-167. New York, Longmans, Green and Company, 1945. Pp. xiv 


and 493. 
Problems involving (1) the arrangement of test items according to difficulty, 


(2) the conversion of judgments into scores, and (3) the scaling of total scores 
on a test are considered. 

HOLZINGER, Kart J.: Statistical Methods for Students in Education, pp. 118-122, 
pp. 224-230. Boston, Ginn and Company, 1928. Pp. viii and 372. 

These passages discuss the use of standard scores and the scaling of test 

questions. 

KELLEY, TRUMAN L.: Statistical Method, Chapter VI, pp. 109-122. New York, 
The Macmillan Company, 1924. Pp. xi and 390. 


The sections on “Standard Measures” and “Equivalence of Successive Рег-: 


centiles" are especially valuable. 

МсСли, WILLIAM A.: Measurement. New York, The Macmillan Company, 
1939. Pp. xv and 534. 

Book Eight contains four chapters which deal with how to scale tests and 
compute statistical measures. See especially the chapter on “Scales and Their 
Construction.” 

Remmers, Н. Н. and Gace, N. L.: Educational Measurement and Evaluation, 
Chapters XXI and XXII. New York, Harper and Brothers, 1943. Pp. ix 
and 580. 

Beginning with an explanation of the meaninglessness of raw scores, these 
two chapters describe simple statistical procedures basic to interpreting and 


understanding scores. 


PART III 


Theory and Practice 
of 


Test Administration 


This part of the text is not designed for use by students in a first 
course in Physical Education Tests and Measurements. Although a 
certain amount of the material presented here will be quite under- 
standable to anyone, it leads rapidly into more advanced procedures 
which hopelessly confuse the student who is for the first time being 
introduced to statistical techniques. 

The material in Part III is designed only for advanced students 
confronted with problems involving test construction and should be 
used only in graduate courses and then only when students have a 
basic understanding of elementary statistics. 


325 


CHAPTER XVII 


Criteria for Selecting Tests 


Until a few years ago, most of the published material relating to 
tests and measurements in physical education revealed the fact that 
the work had been done without any particular consideration given 
to the problem of what constitutes a good test. The test material was 
set up empirically without regard to validity or recognized statistical 
techniques in handling data. Our more recent workers, however, 
have been trained in the scientific approach to problems of test con- 
struction and much of our present-day experimentation is able to 
stand the close scrutiny of well-informed people in the testing field. 
It therefore behooves all teachers of physical education to acquaint 
themselves at least with developments in the testing field so that 
ned may be able to understand published material in their own 
le 

Authorities in testing and measuring are rather well agreed as to 
What are the most important criteria to be kept in mind in the choice 
of tests.! These will include: (1) validity, (2) reliability, (3) ob- 
Jectivity, (4) administrative economy, (5) the use of norms, (6) 
duplicate forms, (7) standardized directions. 

l. Validity. The first question we should ask about a test is, 


T 
Does the test measure what it purports fo measure, or what we 
ell does it measure what it claims to meas- 


d by the correlation of scores on 
or criteria of the ability in 
ing of the skills which are 


Wish to measure, or how w 
ure?" Validity may be determine 
the test with some outside criterion, 
Question. The test should provide a sampl 


"See Selected References at the end of the chapter. 
327 


328 Theory and Practice of Test Administration 


important in that ability. Hence the ability must be Jo aeuo 

tatement made concerning how the material was collected. us 
i daima his test of general athletic ability for high school boys, 
Rogers? correlated his tests of muscular strength with an athletic 
index obtained by weighting and combining scores made on athletic 
ability tests, by showing that the average athlete is found to have a 
strength index not reached by 95 per cent of the boys of his own 
group and by further demonstrating that the best athletes score 
so high in tests of muscular strength that their strength indices are 
not reached by 99 per cent of their fellows. 

Brace? used three criteria for validating his tests: 
ratings on a large group of boys and girls; (2) scores o 
athletic events; (3) а comparison of scores of athletic team members 
with those of a large group of pupils of which they were a part. 


used the following criteria 


(1) judgment 
n a variety of 


n score is only reached by 
group; (4) the fact of a 


t battery was established by obtaining a 
-967 between it and the composite or cri- 


As a partial means of validation it would 
pick certain items in a test by means of what 


power. In other words, items in a test batte 
distinguish between ability or achievement i 


be entirely possible to 
is called descreminative 
гу are used which best 
п а scale. Suppose, for 


2 Rogers, Е Frederick Rand, Physical Capacity Tests in the Administration of Phys- 
ical Education. New ork, Teachers College, Columbia University, Contribu- 
tions to Education No. 173, 1925. 


3Brace, David K. Measuring Motor Ability. New York, 
Company, 1927. 
4Cozens, Frederick W., The Measurement 0, 


f General Athletic Ability in Сой, 

Men. Eugene, University of Oregon Press, 1929, Меи 
5Greene, Harry A. and Jorgensen, Albert N., The Ure and Inter, relati High 
School Tests, p. 138. New York, Longmans, Green and Meri LU 


Company, 1956. 


А. S. Barnes and 


Criteria for Selecting Tests 329 


example, that one wished to construct a general athletic ability test 
for high school boys. One of the general elements which should be 
included in such a test would undoubtedly be “Arm and Shoulder- 
girdle Coordination." Under this heading there are a number ot 
possible tests which might be used, among which may be listed 
Playground Baseball Throw for Distance, Basketball Throw for 
Distance and Football Pass for Distance. In selecting one of these 
as an item in a test battery, consideration will be given to the inter- 
correlations of these tests with other items proposed for the battery. 
Some consideration should also be given to how each of them may 
distinguish between the abilities of various age-height-weight 
groupings, called Class A, Class B, Class C, etc. Using the technique 
for establishing the reliability between the mean differences of each 
group in each of the tests, the test having the best discriminative 
power can be determined. ê 

These examples may give the reader an insight into the problems 
of establishing the validity of tests and test batteries. 

2. Reliability. The reliability of a test may be defined as, the 
extent to which the measuring instrument gives a constant score 
for a constant degree of that which it measures. In other words, 
the reliability of a test refers to the accuracy of the test itself, how 
accurately it measures whatever it may measure, not how accurately 
it measures what it is supposed to measure. In general the reliability 
of a test is computed by correlating one form of the test with another, 
In many phases of testing in physical education this is not necessary 
Or practical. A simple repetition of the identical skill under the same 
Conditions as previously used will yield a correlation coefficient 
showing a high degree of reliability. “Practice effect” between 
testings may tend to reduce correlation slightly, but under ordinary 
Conditions the practice effect is the same for all students. It should 
be observed here that in particular groups of physical education 
tests, such as those used in throwing balls at a target, ће“ measuring 
stick” and equipment is ordinarily not refined enough, and hence 
the reliability coefficient will be low. In accuracy throwing tests, a 
number of factors will influence performance. These may include 


The formula for determining the standard error of a difference is 
ва = Мом ема 


When D, the actual difference between means, divided by its standard error 
(ca) is 3 or more, complete reliability is indicated, 


330 Theory and Practice of Test Administration 


such things as size and shape of target, size of ball in relation to the 
size of hand, condition of throwing arm, air currents, condition of 
ball which is thrown, state of vision, etc. A case of increase in relia- 
bility occasioned by a change in size and type of target may be cited 
in the following example.’ A reliability coefficient of .189 was 
obtained іп‘ baseball throw for strike" when the pitching distance 
was 60 feet and the target was a rectangle 17 X 34 inches painted on 
a hand ball court wall, with the bottom of the target 20 inches above 
the surface of the court, number of cases 60. A concentric circle 
target (5 circles) was constructed with the circles varying in diameter 
from 1 to 5 feet and hits counting as 5, 4, 5, 2, and 1 points in reverse 
order for the size of the target. With 60 cases the reliability of this 
test was computed as .608 showing a very substantial increase. Ten 
balls were thrown in each case but in the first test hits only were 
counted and no account was kept of the nearness of the “misses” to 
the rectangle. 

Significance of Reliability. It is highly important that the meaning 
of reliability be thoroughly understood. When a score of 50 is 
obtained on a particular test, the novice in test construction may 
be inclined to consider this score as final and the absolute index of 
the individual’s ability. However, if the test is repeated or another 
form of it given, the same individual may score 53 or 48 or 51. As 
yet it has been impossible to construct a test of perfect reliability 
so that in repetitions individuals will score exactly the same in one 
attempt as they do in another. We expect some fluctuation in scores 
and a statistical device has been formulated for calculating the limits 
within which a true score may fall. The actual amount resulting 
from the use of this device is called the standard error of estimate, or 
by some writers the standard error of measurement. The formula for 


the standard error of measurement is written 
o(Meas.) = c(disty VI — ri. Where (Meas.) = Measurement 
ist.) = distribution 
and rx = the reliability 
coefficient 


Hence, if the reliability coefficient of a test is .90 and the standard 
deviation of the distribution 10, we have 


o(Meas.) = 10V 1 — .9 = 3.16 
Suppose the student scores 50 on the particular test in question, we 


7Cozens, Frederick W., op. cit., p. 144. 


—— 


Criteria for Selecting Tests 331 


may say that the chances are 68 in 100 that his score will always fall 
between the limits 50 + 5.16 or between 46.84 and 53.16. 

As has been indicated in Chapter XIII, the probable error ot 
measurement is .6745 times the standard error and in the example 
cited we can say that the chances are 50-50 that the individual’s 
true score will fall between the limits 50 + (.6745 X 3.16) or 
between 47.87 and 52.13. 

Effect of Test Length on. Reliability. Other things being equal, the 
test which has the most parts, which is measuring the most elements 
of a given ability, is the most reliable. The average of a number of 
broad jump trials is a more reliable measure of ability to broad jump 
than any one broad jump performance. Ten track events or a 
decathlon give a more reliable estimate of a boy's ability in track 
than two. When the sampling is increased, the confidence, which 
can be placed in the test as truly representing all that is to be meas- 
ured, is also increased. Knowing the reliability of a single trial on a 
test, we are able to estimate the reliability of the sum or average ot 
This is done by means of the Spearman- 


two or more trials or tests. 
Brown prophecy formula. $ 

If instead of the reliability of the sum or average of two or more 
trials in a given test, the reliability of the sum or average of a num- 
ber of different tests is what it is desired to know, one may estimate 
that figure rather closely by means of the Spearman formula for the 
correlation between sums or averages.? This formula is, however, 
quite complicated for the person who has not had considerable 
training in statistical methods. The Spearman-Brown formula is 
much simpler and will give a very useful approximation, probably 
a little too high. The extent to which this is too high will depend 
"pon the degree to which the reliability coefficients of the component 
subtests exceed the intercorrelations. In many instances the end- 
results from the use of the Spearman-Brown formula will not be too 
great, 

Asan example of the use of the Spearman-Brown formula, suppose 
we have a number of tests of speed in running: 


8 
Kelley, T. L., Statistical Method, p. 205, formula (157) 
ani 


: ref, Af = TF (a — Тулу 
11 Which af represents the average score pa Е s of a test, Af represents the 
um average score on “4” other similar forms, and ги is the average reliability co- 
open of the “а”' forms 
bid., p. 197, formula (147). 


332 Theory and Practice of Test Administration 


Rel. Coef. 
50-yard dash .950 
100-yard dash .970 A 2 
150-yard dash „940 Ax = «950 
220-yard dash .940 


and we wish to know how much reliability can be placed in the aver- 
age score in all four tests. Substituting in the equation cited in the 


footnote, we have 
4 х .950 


raf Af = 1 + (3 X .950) 
When subtests in a battery are comparable in every respect, the 
Spearman-Brown formula may be used to predict the reliability ot 
the battery as a whole. It can be readily seen that the reliability 
coefficient of the battery as a whole is materially greater than 
the reliability coefficient of any single item, 

Influence of Reliability on Validity. A test which does not correlate 
more than .6 with itself cannot, except by chance, correlate higher 
than .6 with anything else except in rather infrequent situations 
where the reliability of “anything else" is very much greater than 
the reliability of the test itself. Hence it will be immediately seen 
that an item in a battery of tests which has a low self-correlation or 
a test itself which does not have a high reliability coefficient cannot 
be highly valid. It does not follow from this that reliable tests are 
always or generally always valid. Low reliability will tend to pro- 
duce low validity but high reliability may have very little todo with 
validity. 

Reliability in Indwidual and Group Measurements. When meas- 
uring a group as in a survey test of an entire school or school system, 
we are dealing with averages and hence may obtain reliable measures 
even though a small number of subtests are contained in the battery. 
When a test is used for purposes of diagnosis, however, each item of 
the test must be used for interpreting strength or weakness and 
must be quite reliable. A battery of tests with a small number of 
subtests may be used for survey purposes but for the same degree of 
reliability a diagnostic test may be several times as long. 

Degree of Reliability. The question may be aptly put, “What 
should the reliability coefficient of a battery of tests be in order that 
the battery may be safely used?” The answer to this question has 
only been given in an approximate fashion. It will depend to some 
extent on the time consumed in testing, and upon the group for 
which the test is designed. A test sampling a wide range of ability 


= .987 


Criteria for Selecting Tests 333 


is apt to yield a higher reliability coefficient than if it sampled 
Bp single grade. Ruch and Stoddard!? with a great deal of 
hesitation have offered the following suggestions as to “some rough 


guides” for educational tests. 


Rel. Coef. Interpretation or significance. 
:95=.99 Very high: rarely found among present tests. 
.90-.94 High: equaled by a few of the best tests. 
.80-.89 Fairly high: fairly adequate for individual measure- 
. ment. 
.70-.79 Rather low: adequate for group measurement but not 


for individual measurement. 
dequate for individual measurement 
and school surveys. 


Bel very satisfactory 
elow .70 Low: entirely ina 
although useful for group averages 


3. Objectivity. One of the factors contributing to the reliability 
of a test is the degree of objectivity which is involved in the scoring 
of test items. By objectivity is meant the degree of uniformity with 


whi > i 
which various teachers may score the same tests. In written exam- 


Inations it is quite obvious that different teachers or examiners will 
not score the same on papers in which judgment is required in scor- 
ing. In physical education activity tests, however, objectivity in 
Scoring does not present much of a problem since scores are recorded 
In units of time, distance, height or number of times a particular 
exercise is accomplished. Care must be taken in reading the stop 
Watch or tape correctly or in counting performances, but these do 


Dot present the problem that test makers in other fields have to face. 


bjectivity in scoring will, of course, affect reliability in any test, 


but the chances of recording wrong scores 
in physical tests as in tests in the academic lines. 
4. Administrative Economy: Three important items ought to 


be Considered under this heading. 
Lime Requirements of the Test. 


are by no means so great 


One cannot say that the shorter 


the test in point of time consumed in administering it, the better it 
will be for use. As a matter of fact, jn so far as test items or subtests 
in a battery are concerned, the longer the test, other things being 
‘qual, the greater the reliability, the more factors it will measure 
and consequently the higher will be its validity. In physical educa- 


tion, the time element must be given consideration because of the 


fact that less time is given to physical education than to many of 


Ruch, G. M. and Stoddard, G. D.» Tests and Measurements in High School 
Instruction, p. 54. Yonkers-on-the-Hudson, New York, World Book Company, 


334 Theory and Practice of Test Administration 


the other subjects. Even with the same length period, our time will 
still be shorter because of the consideration which must be given to 
changing clothes and bathing. If too much time is taken in testing, 
our teaching possibilities are materially reduced. However, it is 
better to get complete data on a student at one time than to get 
incomplete data and find that more testing is required. It is certain 
that, if we wish to measure correctly, we must pay the price. Espe- 
cially is this true of testing for diagnostic purposes. 

Cost. The time required to give a test may play a very important 
part in the matter of cost. Students may have to be trained and 
hired as helpers. Clerks may have to be trained and hired for 
computing scores and the like. If testing is to be of value, the results 
must be secured and interpreted almost immediately, and with 
large groups teachers cannot be expected to assume the entire 
burden. 

The cost of securing test blanks must be taken into consideration. 

If special apparatus is to be used, the construction of this will be 
another added feature to the cost item. Other things being equal, 
the less special apparatus needed the more desirable the test. 
Special apparatus involves special directions, more intricate pro- 
cedures, more time involved in scoring, more chance for error in 
scoring and consequently less reliability. 

Adaptability of the Test. The criterion of administrative economy 
also implies that the test selected for use is in keeping with program 
purposes. A test should not only be valid in terms of measuring the 
qualities it purports to measure, but also the quality to be measured 
must be a well defined objective of the teaching situation. Too often 
teachers will select a standardized test solely because it meets the 
criteria of a good test, and little consider its purpose in relation to 
pupil needs and teaching objectives. Aimless testing is a needless 
waste of teaching time, and each test must be judged in terms of 
its appropriateness to the situation at hand. 

5. The Use of Norms. An unfortunate circumstance which has 
operated to restrict materially the usefulness of tests for physical 
ability and development has been the lack of adequate and repre- 
sentative age and grade norms. Brace’s Scale of Motor Ability 
Tests!! presents norms of sufficient numbers of cases to warrant 
consideration as valuable for use, even though these norms were 


11Brace, David K., op. cit. 


Criteria for Selecting Tests 335 


computed from cases in a particular locality. Whether the norms 
set up for this test are possibilities for use elsewhere, may be a 
question. If national norms are to be established, a sampling of 
performances over the entire country should be taken. This may 
involve difficulties well nigh impossible to overcome. However, 
mere numbers do not produce good norms. Adequate sampling plus 
a sufficient number of cases to reduce the standard error of estimate 
to a negligible quantity are the keynotes of good norms. To illustrate 
the point, let us assume a specific example. 


TABLE XXIV 
RUNNING Bnoap JUMP 


Situation I Situation IT 
Grade 

Mean or No. of Mean or No. of 

average | cases average cases 

IX 12'9” 2500 12'9” 25,000 

X 15'6" 2500 13'6” 25,000 

XI 14'2" 2500 14/2" 25,000 

" XII 14/10" 2500 1410” 25,000 
College freshmen 15/5" 2500 15/5" 25,000 


n Suppose that the standard deviation (с) in both situations is 15 
inches, Then by the formula for the standard error of a mean or 
average (see page 275, Chapter XIII). 

с 


см = VN 


` " 15 E 
In Situation I this would be тм = "8800 = .5 inches 
15 


In Situation II it would be ом = 25,000 = .095 inches 


Using normal probability tables (see Garrett or Kelley or Table 
XXXII): in Situation I, the chances of obtaining a mean score for 
а grade group, which varies from the mean shown in the table by as 
much as 1 inch, are only 1 in 2000. In Situation II, the chances of 
obtaining a mean score for a grade group, which varies from the 
mean shown in the table by as much as 3% inch, are approximately 
1 in 25,000. 


336 Theory and Practice of Test Administration 


Since the least difference between norms is 7 inches, the reader 
can readily see that the chance for interpreting a pupil’s score 
incorrectly through errors in norms is very slight in Situation I and 
still more slight in Situation II. The point is, that for all practical 
purposes, 2500 cases is sufficient for a test of this sort. There is 
much more danger of an error in the pupil’s test score than in the 
norm. Hence the emphasis which has been put on reliability of the 
test. 

6. Duplicate Forms. At first thought it would seem that the 
necessity for having two or more forms of a given test is not nearly 
so pressing in physical education as academic subjects. Ordinarily, 
with mental tests there are undoubtedly memory effects if the same 
test material is repeated within a reasonable period of time, say 
within a year to eighteen months. To some extent at least, the 
“carry-over” in physical tests is entirely possible. Boys and girls 
are interested in self-testing activities and are inclined to repeat 
and practice certain events which interest them and challenge their 
abilities and skills. Hence it would seem extremely desirable to 
provide at least two forms of a test measuring a particular element 
of physical ability. These forms must be equivalent in the strict 
sense of the word as used in educational measurement. 

In order that forms may be exactly equivalent a number of con- 
ditions must be met:!? 

1. The items or single test events represented by the two forms 
or the several forms must be samplings taken at random from a 
large amount of material which has been proved valid for measuring 
the particular ability in question. For example, suppose we wish to 
measure skill in “arm and shoulder-girdle coordination." To do 
this 10 to 15 single throwing tests have been used and arranged in 
order of difficulty from 1 to 15. Tests 1, 7, and 13 are selected for the 
first form and 2, 8, and 14 for the second. 

2. No duplication of single test events should be permitted. 

5. The average difficulties of the two or more forms should be 
equal, that is, one form ought not to be preferred over another. 

4. Within reasonable limits, the spread of scores of pupils on the 


two forms or the variability of the two forms should be the 


same. 
5. “The scores of individual pupils should vary as little as possible 


12Adapted from the discussion in Ruch, G. M., and Stoddard, G, D., ор. cit, 
pp. 65-66. 


Criteria for Selecting Tests 337 


from form to form; i. e., each form should be made long enough to 
provide stable and reliable individual measures." 13 

It is essential that the authors of tests give all statistical data 
necessary to the proof of their equivalent forms in the manual of 
directions for the use of the test. When data are not presented it 
must be assumed that the "equivalence of forms" has been empiri- 
cally set up. 

7. Standardized Directions. This is a very important item 
although neglected in many tests of physical skills. The directions 
Which are given the examiner and the student should be carefully 
worked out as well as the exact method of administering particular 
phases. These directions should be printed and not left to the 
imagination. With physical tests especially, illustrations should 
accompany the manual of directions. An excellent illustration of 
‘the photographic detail necessary in working out directions for 
physical tests is to be found in Brace’s study.!4 One illustration 
may serve to show the necessity for explicit directions. In a simple 
test like that in chinning the bar a number of things must be given 
consideration: (1) the size of the bar, (2) distance of the bar from 
the ground, (3) method of holding the hands, that is, front grasp, 
Inverted grasp or alternating grasp, (4) whether or not the arms are 
required to be extended at full length on the downstroke, whether 
Test between chins is allowed, etc. АП of these things will have an 


influence upon performance. ! ? 


Selected References 


Dovorass, Harr В. and Мил, HUBERT H.: Teaching in High Schools. New 
York, The Ronald Press, 1948. Pp. viii and 627. | . 

Deals in Chapters XX and XXI with measuring, evaluating and reporting 

Pupil progress. Indicates the purpose of measurement, criteria for velecling tests, 


measuring understanding, giving and scoring examinations, marking and reports 


to parents. 
REENE, Hannv A., JORGENSEN, 
Measurement and Evaluation in th y р х 
еу York, Longmans, Green and Company, 1945. Pp. xxvi an 670. m 
Of special importance in these pages are the references to factors which 
Should be considered in a criterion for tests and the reproduction of the Cole-von 
Borgersrode Scale for Rating Standardized Tests and the Otis Scale for Rating 


ests. 


13 r 

1, Kuch, G. M. and Stoddard, С. D., ор. cit, P- 66. 
= гасе, David K., op. сй. 475 
9r specific directions on this particular event, see p. 575. 


338 Theory and Practice of Test Administration 


KELLEY, Truman L.: Interpretation of Educational Measurements. Yonkers-on- 
the-Hudson, New York, World Book Company, 1927. Pp. xiii and 565. 

Chapter II, “Purposes Served by Educational Tests," pp. 18-42, discusses a 
number of items not implied by the chapter title including several valuable 
points regarding validity and reliability not ordinarily touched by the average 
text. 

Chapter VIII, “Measures of Relationship," pp. 151-195, shows that a relia- 
bility coefficient of .50 or higher is needed for group measurement and .96 for 
individual diagnosis. 

Ореш, C. W.: Educational Measurement in High School. New York, The Cen- 
tury Company, 1950. Pp. xiv and 641. Chapter ITI, “Criteria for the Selection 
of Tests,” pp. 52-89. 

Besides the usual discussion of validity, reliability, norms, standardized 
directions and the like, Odell presents some of the factors which may affect 
reliability and discusses constant and variable errors in connection with relia- 
bility. ‘ 

Ross, C. C.: Measurement in Today's Schools. New York, Prentice-Hall, Inc., 
1947. Pp. xvi and 551. 

Chapter III, “The Characteristics of a Satisfactory Measuring Instrument," 
discusses at length, validity, reliability, and usability. 

Rucu, G. M. and STODDARD, GEORGE D.: Tests and Measurements in High School 
Instruction, Chapter IV, pp. 45-68. Yonkers-on-the-Hudson, New York, 
World Book Company, 1927. Pp. xi and 381. 

This is an excellent chapter on criteria for the selection of educational tests. 


CHAPTER XVIII 


Test Construction in 


Physical Education 


Determination of the Quality to be Measured 


The first step in the construction of an achievement test in physical 
education is to determine the quality to be measured. This deter- 
mination may come as a result of a study of the aims and objectives 


of instruction in physical education in a particular grade or set ot 
ailed analysis of the ability 


grades, or, it may come as a result of a det 
їп question. Thus in deciding upon what constitutes general athletic 
ability, Cozens! used the composite judgment of a large number of 
Persons in the profession and found that, in the opinion of these 
Individuals, general athletic ability could be broken up info seven 
elements, namely: (1) arm and shoulder-girdle strength, (2) arm 
and shoulder-girdle coordination, (3) hand-eye, foot-eye, arm-eye 
Coordination, (4) jumping or leg strength and flexibility, (5) en- 
Urance, (6) body coordination, agility and control and (7) speed 
9f legs, Brace, on the other hand, found that a large number of 


Physical educators ranked the following as criteria of general motor 


ability 2 
ух" 
Average rank 

1 as to value 
2 анаа naw ке ЙЕН SARI sena nein чеди анга ee cas 1:05 

B kill in a variety of activities. neee ee te e 2.41 

4. СОЗУ and graceful form in performances ttet ttt Ttt FSI 

page setae apte 3 


reat ability in some special line..- +--+ ++ 07777" 
1 А з 
Cozens, Frederick W., The Measurement of General Athletic Ability, Eugene, 
niversity of Oregon Press, 1929. T 
race, David K., Measuring Motor Ability, 
ompany, 1927. 


p.14. New York, A. S. Barnes and 


339 


340 Theory and Practice of Test Administration 


He then concludes that “ For purposes of this study, the term ‘motor 
ability’ is used to apply to that ability which is more or less general, 
which is more or less inherent, and which permits an individual to 
learn motor skills easily and to become readily proficient in them.” 

McCloy offers an excellent analysis of the quality which he calls 
“General Motor Capacity.” 3 In part he says: 


The invention of valid and adequate tests of general intelligence 
marked the beginning of a greater increase in the scientific control 
of research in classroom education and in more satisfactory methods 
of classification for school work. In addition, these tests offered to 
thoughtful teachers a tool, the results of which gave a rapid and 
relatively accurate measurement of one of the child’s most important 
capacities — a measure that could be obtained in a day and which 
has been proved to be far more valid than teachers’ estimates made 
after a year’s observation. The fact that the naive and uninformed 
have misused this excellent tool by applications for which it was 
never intended and by interpretations that were weird in no way 
detracts from the usefulness of the discovery. 

There is a need in physical education for an analogous test equally 
valid and useful in its own peculiar field. This study attempts to 
present such a test, which, while it admittedly has shortcomings and 

imperfect subelements, may be utilized until a better test has been 
developed to take its place. 

As the ordinary intelligence tests are really tests of general abstract 
intellectual capacity, so this is a test of general innate motor capacity. 
In this term the word “capacity” indicates that the test attempts 
to measure not so much developed ability as innate potentialities — 
the limit to which the individual may be developed. The word 
“motor” is used primarily in the sense of the neuromuscular and 
only secondarily in the sense of the psychomotor. In other words 
there is no attempt to measure what might be called “athletic 
smartness” but rather to measure capacity to learn new skills, as 
well as to measure the more distinctly large-muscle capacities 
involved in potential strength and speed of muscular contraction. 
The word “general” indicates that these motor capacities measured 
are the basic fundamental ones that apply to almost all motor per- 
formance. There is no attempt to measure specific skills and abilities. 

In devising a test such as this the student is confronted with the 
dilemma faced by the formulators of intelligence tests, namely, that 
in an endeavor to measure capacities or potentialities certain skills 
and abilities — as they are at the time — must be utilized. The 
intelligence tests avoid this difficulty largely by utilizing items of 

information which may be assumed to be either the common prop- 
3McCloy, C. H., “The Measurement of General Motor Capacity and General 
Motor Ability,” Supp. to Res. Quart. Am. Phys. Educ. Assoc., Vol. V, No. 1 


(March, 1934), 46-47. 


Test Construction in Physical Education 341 


erty of almost all those who take the tests or to which the tested 
person may be assumed to have been exposed to a degree that, if 
the individual is sufficiently intelligent, should have been enough to 
еше his learning them. Thus, for tests devised for use in the senior 
igh school, items are used which will have been presented to every- 
one in the grades. The test, of course, becomes invalid for those 
who have not had a grade school education or its equivalent. In 
the devising of this test of general motor capacity the same difficulty 
has had to be faced; and since the physical educator is less warranted 
i an is the mental educator in assuming that any standardized skills 
ave been learned in preceding grades, because of the chaotic lack 
of standardized curricula throughout our country, the elements 
chosen should be: (1) of themselves innately valid — such as ele- 
ments of age, height, etc.; (2) such as to permit of a standard degree 
of Practice being given previous to the test in order that all may be 
amiliarized with the skill or form; or (3) only items that have been 
unpracticed by all so that the present opportunities for learning will 
€ equal for each of the tested pupils. 


. We see then that a complete analysis of the ability to be measured 
1S essential as a first step in test construction. 


Setting Up Criteria For Establishing Validity 

Validity has previously been defined as the degree to which a test 
Measures what it purports or claims to measure. Hence in deter- 
mining the validity of a given test, that is, in determining whether 
1t measures what it claims to measure, we must have some measure 
with which to compare our test. So much work has been done in 
the field of mental tests that criteria are more readily available. In 
Physical education, however, it will be necessary to set up а criterion 
Or criteria by which to judge the measure at hand. The recent test 
Studies in the field have done this but it is doubtful whether all the 
Possibilities have as yet been considered. А résumé of the possible 
Ways of validating physical education tests or test items will include 


© following: 


(1) Analysis of the content of cou 
2) Combined judgments of expert 


(3) The use of rating scales. aes: к 
(4) Correlation with measures of success of the ability in question. 


Increase of accomplishment with successive ages. 
Correlation with previously validated tests. 

7) Social usefulness. 
Combination of several of th 


rses of study. 
s in the field. 


e above. 


342 Theory and Practice of Test Administration 


1. Analysis of Content of Courses of Study. An analysis of courses 
of study, though the courses are not likely to contain much detailed 
content, should nevertheless be very useful in determining repre- 
sentative practices in the field. A determination of aims and 
objectives will assist the test builder in formulating his material. 
It will probably be necessary for him to consult other sources of 
information, such as, textbooks on various phases of physical educa- 
tion, magazine articles and the like. It should be pointed out that a 
test built upon an analysis of courses of study may represent and 
measure what is taught rather than what ought to be taught. As 
a matter of fact, an analysis of this sort really represents an analysis 
of a number of expert judgments as to what a curriculum should 
contain. 

2. Combined Judgments of Experts in the Field as a Means of Fali- 
dation. When all is said and done practically all of the possible 
means of validation rest on combined judgments. Curricula of 
physical education rest upon the opinions of the leaders in the field 
and are the result of a constant working-over of material. The value 
of combined judgments can be seen at the beginning of this chapter 
in the determination of the elements which make up a certain quality. 
Judgment ratings are used quite extensively in representing the 
opinion of the judge as to how much of a given thing a pupil has. 
In physical education activity work this may be a rating on motor 
ability, general athletic ability, gymnastic ability or ability in any 
one of many different lines. ‘‘ Experience has shown that the average 
or median judgment of a group of from three to ten caretul judges is 
certain to be superior to the opinion of a single worker in approxi- 
mating the true worth and difficulties of proposed test items." 4 
A method of rating individuals in a particular trait may be accom- 
plished as follows. Submit to each judge a list of the individuals to 
be rated, together with an explanation of the quality to be measured 
and the number of groups or classes to be considered in the judgment. 
Thus, if a test builder wished to get an opinion of a number of judges 
on general athletic ability, the number of groups to be considered 
might be five — superior, above average, average, below average, 
and inferior, with point values of 5, 4, 5, 2 and 1 respectively. The 
scores of each individual might then be averaged and correlated 

4Ruch, G. M. and Stoddard, G. D., Tests and Measurements in High School 


Instruction, p. 510. Yonkers-on-the-Hudson, New York, World Book Company, 
1927. 


Test Construction in Physical Education 343 


with each test item or a combined score of several test items. “A 
high correlation between a test and its criterion may be taken as 
evidence of validity, provided both the test and the criterion are 
reliable. But before accepting criterion correlations as final, we 
should know the reliability of our test, and if possible the reliability 
of the criterion." 5 ` 
5. The Use of Rating Scales in Setting Up Criteria. This method 
shades closely into the method of combined judgments. Technically 
owever, a rating scale would be made up of a number of elements 
Which combine to make up a general quality. The example taken 
from Brace, at the beginning of this chapter, indicates a rating scale 
on tlie quality of general motor ability. Authorities feel that rating 
Scales should be used with great care and that their popular use is 
due to lack of better measures and ignorance of their limitations. 
ating scales are highly subjective and hence open to errors ot 
Measurement to a marked degree (а phenomenon doubly subtle 
€cause of the apparent simplicity of ratings). Their use should be 
Confined to a last resort." 5 
4. Measures of Success as a Criterion. Makers of mental tests of 
all sorts have used teachers’ marks for the validation of test material. 
Sachers' marks in physical education also have possibilities for use 
35 a criterion of SÊSÊ but it must be remembered that they have 
imitations since they represent many other things beside аы 
ment, namely, interest, attitude, improvement, effort, behavior an 
the like, Besides teachers’ marks as possible criteria of success in 
Physical education, it is possible to use such things as success 3n 
Making school Deom or in making school letters as measures ot 
Benera] athletic ability. A test of gymnastic ability, if valid, ut 
Snow that the best gymnasts in school score very high on E se Д 
So high in fact that only a small percentage of the normal run o ie 
Will eye, be able to attain the average score of gymnasts. Аза 
Possible criterion of motor ability Brace? uses “the sum of scores on 


] ; туеп 
A Variety of athletic events in which pudore FP = babe 
npe Opportunity to make the best scores of which they P. аи » 
°са оп Tests constitute а variety of e^ bg wen 7 
* Increase of Accomplishment with Successwe 2967. i 
5 “ “ 
Garrett, Н. E., Slalislier in Psychology and Education, P- 325. New York 
Ru Mans, Green and Company; 1945. . 313 
тка, G. М. and Stoddard, б. D», ор. сй» Р? 
асе, David K., op. cit, p- 25. 


344 Theory and Practice of Test Administration 


ability, general athletic ability and abilities of this nature (that is, 
physical abilities) increase with age, a test to be a valid measure of 
any element of these abilities should show a gradual and somewhat 
uniform rise in success as we go up the age scale. If we find that a 
given test, such as the standing broad jump, does not show an 
increase in ability at the successive age periods, that test will not be 
a valid measure of jumping ability throughout all the age periods 
to maturity or eighteen years of age, let us say. However, what we 
do find is that samplings of all sorts of physical strength and skill 
tests in a limited age range show decided progression and at the same 
time show a wide range and overlapping in the fundamental activ- 
ities. 8 : 

6. Establushing Validity by Correlation with Previously Validated 
Tests. It should be quite apparent that when material intended to 
measure a given function is correlated with a test measuring that 
fuction, and, when the resulting coefficient is high, that material 
may be considered to be a measure of the function. As an example, 
there seems to be sufficient evidence of the validity of Rogers’ 
Strength Index as a measure of general athletic ability with high 
school boys. A reliable battery of activity tests purporting to 
measure the general athletic ability of high school boys showing a 
correlation of .80 to .90 with Rogers’ Strength Index, ought to be 
considered a valid measure of general athletic ability. 

7. Validation by Determining Social Usefulness. As an aid in deter- 
mining the validity of test material, a selection might be made on 
the basis of tests which measure fundamental skills most useful in 
our daily social life. The question might well be asked, “What are 
the fundamental physical skills which are most useful in our every- 
day life or which contribute most to the physical efficiency of the 
individual in his social world." In setting up a test for college men 
Professor Kleeberger of the University of California analyzed this 


8See Bliss, James G., “А Study of Progression Based on Age, Sex and Individual 
Differences in Strength and Skill,” Am. Phys. Educ. Rev., XXXII (February, 
1927), 88-89. 

The achievement scale studies on elementary school children and secondary 
school boys are excellent illustrations of this progression. A 
See: Neilson, N. P. and Cozens, Frederick W., Achievement Scales in Physical 
Education Activities for Boys and Girls in Elementary and Junior H. tgh- Schools, 
p. 164. New York, A. S. Barnes and Company, 1934. 

Cozens, Frederick W., Trieb, Martin Н. and Neilson, N. P., Physical Education 
Achievement Scales for Boys in Secondary Schools, p. 149. New York, A. S. 


Barnes and Company, 1956. 


Test Construction in Physical Education 345 


problem and concluded that college men in order to be able to take 
care of themselves physically in our social world ought: 

(1) To have developed a degree of agility in everyday experiences 
in running, jumping, climbing, vaulting and falling; 

(2) To know how to defend themselves in either boxing, wrestling 
or fencing; 

(3) To have developed skill in swimming and in helping a com- 
panion in the water. 

The validity of the University of California physical efficiency 
tests is partially established therefore by means of the criterion of 
social utility. 

The selection of tests based on a criterion of this sort might be 
made from a study of the following fundamental bodily skills, 


exclusive of swimming: 


l. Running. 

2. Jumping. 

5. Throwing. 

4. Catching. 

5. Kicking. 

6. Pushing. 

7. Pulling. ' 
8. Dodging (that is, ability to change direction quickly). 

9. Hand-eye coordination with an implement or bat of some sort. 
10. Strength of torso muscles. 

11. Balance. 

12. Ability to get quickly over an obstacle. 
13. Moving quickly while carrying an object. 
14. Control of body in air or when going forward head first. 
15 


. Control of body while hanging by arms or attempting to move 


from hanging or vaulting position. 
readily be put — in order to live fully in our | 
jump, climb, throw, 
useful and a test 


The question may 
Social world, does one need to know how to run, 
Catch, etc.? If he does, the activity is socially 
€mbracing the activity is important. 


Preliminary Try-out of the Tests or the Assembly 


of a Trial Battery of Tests 


A Typical Problem. In order to make the procedure of test 
Construction more clear, the selection of a definite problem will be 


346 Theory and Practice of Test Administration 


of special value. Let us suppose that it is desired to set up a test 
measuring The Physical Activity Age of Junior High School Boys. 
This is a rather complicated problem and it is not possible to go into 
all of its intricacies but for the sake of illustration, we may say that 
the composite judgment of a number of experts in the field of physical 
education indicates that (exclusive of the element of swimming) 
the things that go to make up “physical skill" with Junior High 
School Boys are those fifteen fundamental elements previously 
listed. The Junior High School Boy to be physically skilful must 
know how to run, jump, vault, throw, catch, dodge, ete. 

Selection of Test Items. Suppose also that a considerable 
number of single tests under each one of these elements has. been 
listed. Take the case of jumping as an example. The following 
selection obviously measures various phases of that ability: 
(1) standing broad jump, (2) running broad jump, (3) standing 
hop, step and jump, (4) running hop, step and jump, (5) high jump, 
and (6) jump and reach, or head touch jump as it is sometimes 
called. A listing of tests under all the other elements should be 
made. We shall probably need only one test of jumping skill in our 
final selection of tests, but for a preliminary try-out a large number 
is desirable in order to allow for elimination due to unreliability, 
difficulty of administration, difficulty of standardization and the 
like. 

Preliminary Try-out. Test instructions should be definitely 
formulated so as to insure the same opportunity for each boy. Asa 
matter of fact, some experimentation with the various tests should 
have been performed prior to the trial testing to insure standardized 
directions. The boys to be tested should represent as typical a 
school group as possible and should be selected by rooms and grades 
so that at least 100 boys of each of the ordinary junior high school 
ages are obtained. 

Each boy should be given a chance to do his best in the particular 
event in question by allowing hj 
best of the three. The tests 
muscle groups will not be 


of events for three groups 
1. Dash, throw, jump. 
2. Jump, throw, dash. 
3. Throw, jump, dash. 


Te А Р 
st Construction in Physical Education 347 


Te i$ | 
kes essential that warm-up be taken into consideration, 
When T à ents such as throwing. 
event, a diat Finn a particular age have been tested in a given 
E a ution of the scores should be made. It will probably 
each age. ay out a distribution for the entire group as well as for 
S i sace ps 
of E oM Reliability of Simple Tests. An adequate sampling 
reliabilit 8 (Ped should be given every test twice in order that the 
E чате снай test тау be computed. If possible, at least 100 
acomplia} > be used for this work. This procedure might be 
year-old seen as follows: on Tuesday, let us say, a group of thirteen- 
and the st s has been given a dash, a baseball throw for distance 
sing the « ing broad jump. This group next meets on Wednesday. 
ions), mi арт order of events and the same procedure (instruc- 
the Eu the scores again on Wednesday. А correlation between 
Coefficient о dip and those of Wednesday will yield a reliability 
icular te А t 1e size of which may determine whether or nof a par- 
rate] St ıs worth while from the standpoint of measuring accu- 
d Whatever it measures. 
E. онно: of Tests. Not all the forty or fifty test items or 
E чн which are included in the original list will be found 
tt fact e tor selection in the final battery to measure “physical skill.” 
few wh d will have to be eliminated for one reason or another. 
1. U the causes of elimination will be | | 
What р, that is, а test 15 not consistent in measuring 
Securin oes measure. Such unreliability may be determined by 
the ү a coefficient of correlation between two applications of 
сыгыш 
Kw which require too muc 
twent eliminated. For example, 
high E minutes to test a group of fen to 
igh hos allowing three trials at each 
Same sate pit, while it may be possible t a 
ility in five minutes by such an event as the jump and reach. 
* If tests are to be useful to groups everywhere, the use of 


ap 
avoided which is not common to all school situations must be 
e H 


h time to secure results may have 
jt will take between fifteen and 
twelve boys in the running 
height and using only one 
o test a large part of the 


to 


directions and administrative 
bservers can give them must be 
teacher in the Junior High 


5. "Paste 1 
oa involving such intricate 
S inis that only a few trained о 

ed. The physical education 


348 Theory and Practice of Test Administration 


School must depend upon student help in administering his testing 
program and therefore tests must be simple and objective. 

5. Tests must also be eliminated which limit performance at 
either end of the scale. That is, there should be no zero or perfect 
scores or, if any, very few. If in some accuracy throwing test, we 
find scores bunched at either end of the distribution, that test is not 
measuring accurately the apparent zero performance or perfect 
performance. Аз а matter of fact there should be some experimental 
pretesting in order to weed out tests showing this possiblity. 

6. Those tests which correlate very low with the criterion are not 
adding anything to the net result and these also should be eliminated. 

Securing an Adequate Criterion Score. This is sometimes a 
very difficult problem and the trial group should be selected with 
the thought in mind that an objective criterion score must be secured 
for as manv of this group as possible. In the problem at hand there 
are several possibilities of securing such a score. 

1. A judgment rating may be secured from several teachers in 
the school as to the pupil's ability in fundamental bodily skills. It 
is unlikely that enough judgments can be secured on every individual 
to compute a score for that pupil and other means may have to be 
used. 

2. This judgment rating may be supplemented by taking account 
of the boy's marks in physical education over a period of two or 
more semesters. 

$. Perhaps the most feasible way of securing an objective criterion 
score would be to prove that a composite score covering all tests 
could be used. This problem is similar to one which has previously 
been studied and might be handled in the following manner.? 

(a) The selection of fundamental bodily skills represents the com- 
posite judgment of experts in the field and may be safely assumed 
to comprise a fairly adequate sampling of all possible skills. The 
sum of scores on each of these skills would then be the sum of a 
boy's all-around ability: Performance distributions should be scaled 
in a manner such that a given score in each event will represent the 
same performance level. (See Chapter XVI.) 

(^) Considering that all the fundamental skills are equally 
valuable, tests under each of the skills should be averaged to prevent 
weighting. The sum of these averaged scores would then be the 
composite score. i 

Cozens, Frederick W., op. cit, pp. 152-156. 


Test Construction in. Physical Education 349 


(c) The composite score as a criterion score can then be tested in 
at least two ways. 

(i) Secure as many judgments as possible on the group already 
tested and correlate these judgments with the composite score. 

(ii) Have a group of teachers select the most athletic boys in 
school and see how the composite scores of these boys compare with 
the average of all boys of their ages. 

4. Since Rogers’ Strength Index has been set up for Junior High 
School boys, it might be possible to give this consideration as a 
Criterion for determining the physical activity age. 

„Experimental Conditions. Besides adhering to standardized 
Directions, it must be remembered that all experimental conditions 
in the trial testing should follow those under which the final battery 
15 to be given. Ways and means should be devised to obtain the 
Complete cooperation of all subjects. This may be difficult to obtain 
№ times, but as a usual thing when the purpose of the test is made 
dem, school children cooperate. Also in physical tests, there is a 

sire on the part of every child to attempt to beat the score ofa 
Companion and hence everyone may be expected to do his best. 

Scoring, When distributions of each event have been arranged 
E it is desirable to transpose a score in time, or in feet and inches 

© а definite figure, the T-scale technique or à variation of it may be 
E By construction, sigma index scores (T-scores ora variation) 
m» Comparable and therefore permit addition or averaging. In 

е problem at hand this must be done in order to obtain a composite 
Or Criterion score. In computing the scores on the tests, one of two 
methods may be used. 

(2) A distribution of each event may be made for each age, oF 
T (0) А single distribution for all ages in each event may be set up. 
NM Procedure will probably be the safest in i long жа 
s tes of the various ages can then be compared. A score o 

Arteen-year-olds computed as in (a) cannot be compared t0 uu 

for thirteen-year-olds except ina relative manner, but if all ages 


are t 
town together such scores may 


Si й ; 
йол of the Final Batteries 
Or Ws preliminary work leading up to the sel Н 
th atteries of tests to measure the quality 1n question ne 
e Securing ot 
| Complete set of scores 
3 


be compared literally. 


ection of a final battery 
cessitates 


on all tests of the preliminary battery; 


350 Theory and Practice of Test Administration 


2. А complete set of criterion scores for all subjects of the pre- 
liminary testing; 

3. The elimination of all tests unfit for inclusion in a short battery 
or combination; 

4. The securing of a correlation coefficient between each remaining 
single test and the criterion score; 

5. The securing of all the intercorrelations between single tests 
in each element of fundamental bodily skill. When tests are found 
which correlate highly with each other and about the same with the 
criterion, one of these may be eliminated as measuring approximately 
the same ability as the other. 


The labor involved in (4) and (5) can only be appreciated when 
a problem of this sort actually has been worked. Since the problem 
suggested involves fifteen major variables with a number of subtests 
under each variable, an effort should be made immediately to reduce 
the number of major variables. If the averaged scores of two 
variables, as for example Pushing and Pulling, correlate highly with 
each other, these variables may be thrown together. One may find, 
for example, that the variables of Running, Dodging, Ability to Get 
Quickly over an Obstacle, and Moving Quickly While Carrying an 
Object, correlate so highly with each other that they may all be 
considered under one heading such as Speed of Legs with Body 
Control. Quite naturally this lumping of variables should be done 
before any other correlations are run. Product-moment correlation 
coefficients should be determined in all cases. ! ? 

Combining the Tests. “Where several tests are to be combined 
into a battery there must be considered, not only the correlation of 
each test with the criterion, but also the correlation of each test with 
the other tests. This fact is embodied in the useful though inexact 
maxim which states that the correlation between the tests and the 
criterion should be as large as possible, whereas the correlations 
among the tests should be as small as possible."!! The statement 
just made has exceptions but these exceptions relate especially to 
the signs of the correlation coefficients. It has been shown that in 
physical skill tests with groups older than the group here considered, 
the correlation coefficients between tests and criterion and the inter- 
10See Hull, C. L., Aptitude Testing, pp. 423-425. Yonkers-on-the-Hudson, New 

York, World Book Company, 1928, for a modification of method of securing 

correlation coefficients. 


11 [bid., p. 449. 


Test Construction in Physical Education 351 


correlations are all positive.!? Hence we may safely assume that 
condition to be true in our problem. 

The choice of tests to be used in the final battery may be made in 
one of two ways, (1) by an inspection of the correlation coefficients 
or (2) by employing the technique of partial correlation. When 
more than five variables are under consideration this second method 
becomes very involved and must be cast out. Kelley !3 has devel- 
oped a variation of the partial correlation technique which may be 
used by students who have a knowledge of advanced statistical 
methods and by which much of the labor involved in calculating 
regression weights may be eliminated with insignificant loss of 
accuracy. Method (1) will require skill in the interpretation of 
Correlation coefficients and for the sake of illustration a hypothetical 
Situation will be presented. Let us say that the fifteen fundamental 
skills have been finally classified under eight headings, numbered I, 
П, Ш, etc., and that the tests under each heading which appear 
Most prominent for trial battery selections are numbered 1, 2, 5, etc. 

he criterion score is denoted by 0. Table XXV will give the desired 
Correlations and intercorrelations. р 

As has been previously indicated, one of the very important items 
Which must be considered in the selection of tests to measure a given 

unction is the reliability coefficient of these tests, that is, their self- 
Correlation. Just how high this should be in order to have the test 
included in a battery is indicated by Garrett! who states that most 
makers of general intelligence tests report a reliability coefficient of 
at least .90 between duplicate forms of their tests for unselected 
&roups of the same chronological age- To be a reliable measure of 
Capacity, a mental or physical test should, generally speaking, have 
а minimum reliability coefficient of at least .90. This minimum will 
Vàry with the group, however. The reliability 1s considerably 
affected by the range of scores made on the test, and to distinguish 
ctween two groups of children of narrow range of ability, a relia- 
ility coefficient of from .50 to -60 is adequate. : 

It will be noted that tests 7, 8, and 10 do not conform о fhis 
: Criterion of reliability. Test number 10 can very well be eliminated 
аза Possibility in the final battery since test 9 in this samé group 


128 
се С i 7 157-189. , 
1 ozens, Е ;, op. ciL, рр 
Kelley, LM eia E 84, p. 502. New York, The Macmillan 
igo mpany, 1924. 

arrett, H. E., op. cit, р. 514-515: 


ton 


istrat 


tn 


of Test Admi 


tce о, 


Theory and Pract 


‘H “do ‘suez0D * M "3 eog — ‘пош ofoj[oo uo opem oznjeu repus jo Apnys e uo pes?q ore sjueto5jeoo поцеүәллоэо esed, 


598° 


TPE’ | 669^ 
gir 


O0S'| zee’) EY" | ITF 


eov" 
60r 


Sor" 


[4:4 
ors’ 
98Ӯ' 
Ore” 
TOS’ 
t6v' 


ТӘР’ 
62v" 
OL" 
ОРЕ” 
S6r’ 
LB’ 


ZOE’ 
61£^ 
cre” 


cor” 
802" 
vie 


6t 


Li 


The" 
soz" 


cer 


жАМЗІ1ҮЯ IVNIY IHL 304 5154], 10 AOIOHD AHL SNINV] ЛО 
asodung AHL чол аяомунчу WaTHONq Isa], ALIALLOV ‘1VOISAHG Y 10 SLINGIOLLIJSOO) NOLLV'I31307) TVOIL3HIOd A] SNIAOHS 


AI ш 


sa[qDHDA 


AXX WIGVIL 


Test Construction in Physical Education 353 


has a desirable reliability coefficient and a higher correlation with 
the criterion. If tests 7 and 8 are cast out, there will be no remaining 
test to measure that particular element. For purposes of diagnostic 
testing neither of these tests would be worth much, but for purposes 
of measuring a general quality when combined with several other 
tests they may be used without greatly affecting the reliability of 
the battery.!9? From the size of the correlation of the single tests in 
each group with the criterion it looks as if our final battery will have 
to be selected from the following: 


вай Dx nase косуды Ен soo rmt DEAS (1) ог (2) 
Element Ihesa- „ейтен a he qnem es (5) or (6) 
Element: Ll [css sesin SE eee est (7) or (8) 
Тенек Унан телнен a Rn S le (9) 
Пеле Агиш» кн к as шайл к кк кв EE (11) 
lemen Veest аа ees mend ee зм (15)* 
Element Viles cessas ене SFE (16) or (17) 
Element ҮШ. эк а онан es (18) or (19) 


"Test (13) is picked over test (14) since they are measuring approximately 
the same thing (intercorrelation of .804) and test (13 
higher correlation with the criterion. 


Test (7) appears to be the best to include in the battery since it 
has the advantage of a lower intercorrelation in elements I, IV, V 
and VI and is even with (8) in II and ҮШ. Further, it has a better 


correlation with the criterion and a higher reliability coefficient. 


In element I, test (2) has an advantage over (1) and (3) in four 
of the other elements and may therefore be tentatively selected. 
Test (5) in element II has the advantage of lower intercorrelation 
with most of the other possibilities. Likewise, tests (17) in element 
VII and (18) in element VIII appear to be the best selections on 
the basis of correlation with the criterion and intercorrelation with 
the other tests. This leaves the final battery comprised of tests 
2, 5, 7, 9, 11, 13, 17 and 18. 

Because of the fact that all raw scores have been reduced to 
T-scores the mean of each test is 50 with a standard deviation of 10. 
Since this is true the weight of each test will be one and the simple 
addition of T-scores in each of the tests will give the final score on 


15For proof of this, see Cozens, Frederick W., ор. cit., p. 148. 


354 Theory and Practice of Test Administration 


the test ог the physical skill score. Thus Xs = Х, + X: + X; + 

4 + Xs + Xe + Xr+ Xs will be the regression equation in 
which Xz is the battery score and X, Xo, Y; etc., are the individual 
test scores. Because of the construction of the T-scale the mean 
score on the battery should be 8 X 50 = 400 points. 

Multiple Correlation of the Battery with the Criterion. 
In order to determine how closely we may predict the physical skill 
of an individual from a battery of tests such as has been outlined, 
we need to know first the size of the multiple correlation coefficient. 


TABLE XXVI 
Test INTERCORRELATIONS — FINAL BATTERY 


varies | 0 | @ | ® | | ө | av | as | an 
(2 .570 T 
(5) .723 .456 asi 
(7) 460 . 253 567 TUS 
(9) 627 .112 472 321 ain gie 
(11) .654 5822 4992 .187 .596 ү 
(15) .707 .972 2559 .162 531 .410 dS 
(17) .565 . 582 .291 ‚195 .288 .540 ‚372 ык 
(18) .729 .395 .405 „321 .892 .470 .523 . 566 


As has been said before, the computation of R by means of partial 
correlation technique would be an interminable process. Kelley’s 
technique of successive approximations, however, will give us an R 
rather quickly and, without going into the detail of any other part 
of the method, the R will be computed.!9 In order to eliminate 
much of the material not used from the previous table of inter- 
correlations, a new table will be presented taking into consideration 
only the intercorrelations of the tests in the final battery. 

In finding the multiple correlation coefficient between the com- 
posite score and the criterion, it is desirable to work out the compu- 
tations in tabular form. Table XXVII shows in tabular form all 
intercorrelations multiplied by the weights of each test, that is, the 
weight of each row and each column. These weights being one, no 
change need be made from the previous table except to put down 


16Kelley, T. L., op. cit., p. 505. 


Test Construction in Physical Education 355 


E intercorrelation twice so that all spaces in the rows and columns 
will be occupied except those indicating reliability coefficients, that 


is self-correlations. 


TABLE XXVII 


Computation or MULTIPLE CORRELATION Corrricient (R) 


Form I 
Variables 


о | ر | ری | رم‎ | @ |а) јаз а? (18) 


B Wis. 1 1 1 1 1 1 T l (Wts.)*. 


Sag LS + Ss + 5, + Ss + Ss EP. 
"Kelley, T. Т. Op. cit., formula (2902), p. 505. 


obtained 


The standard deviation of the composite score тау Бе 


t 
tom the formula, c, = VSW? + S" 
NES A/8 F 19.128 = 5.21 
S 5.035 _ 966= R= correlation of battery 


Ce 5.21 


Poa = 


Com . = 
Posite score with the criterion. ue icked 
et n order that we may see how well the tests chosen were pickec, 
us take another combination by picking the test ın each element 
Uk 
187 ey, T. L., op. cit., formula (292), p. 305. 
* formula (295), p. 505. 


356 Theory and Practice of Test Administration 


correlating highest with the criterion. Our computation table then 
will be as follows: 
TABLE XXVIII 


COMPUTATION OF MULTIPLE CORRELATION COEFFICIENT (R) 


Form II 


0 | @) | (6) | (7) | (9) | G1) | (15) | (16) | (19) 


Wts. 1 1 1 1 1 1 1 1 Wts.?. 


(2) 1 -570,.... | .487| 2 322| .372| .297| .387 1 
(6) П -726| .487)..... 453| .382) .561| .536 1 
(7) 1 -460| .253| .301]..... 187| .162| .221| .333 1 
(9) 1 .627| .112| .571 596| .551| .251| .421 1 
(11) 1 .654| .322| .453| .187| .596..... 410| .495| .429 1 
(13) 1 .707| .372| .382| . 410J..... -409) .545 1 
(16) 1 .579| .297| .361 495| .409]..... .499 1 
(19) n .740| .387| .536| . 429) .545| .499|..... 1 


5.063 2.250 3.0911. 778 2.403 2.6922.611 2.533 5.150 


S = 20.488 Sw? = 
c, = VSW +S = 4/8 + 20.488 = 5.34 
5.063 
бе = zq = 948 


Thus it may be seen that the first selection (.966 as against .948) 
is the better one of the two and that the size of the intercorrelations 
materially affects R. 

Standard Deviation of the Battery. The formula 
wi = Sw! F S gives the standard deviation of the battery in 
terms of unit measures, that is, it considers that all tests have a 
standard deviation of one. Since all tests when computed in T-score 
units have a standard deviation of 10, the true standard deviation 
of the battery will be ten times 5.21 or 52.1, because of the construc- 
tion of the T-scale in which 10 T-score units equal a standard devia- 


tion. 


Т, 7 
ел Construction in Physical Education 357 


This o v R 
is figure (52.1) may be checked by the use of the formula. +° 


w— a a—a 
(weighted battery) = Swope? + S Wye pWeFarva 
1 1 


Sinc 1 (- 
tandad d the problem at hand all weights (w) are one and all 
eviations are equal (being ten) the formula reduces to 


a а—а 
FT weighted battery) = / Swat +S warpa 
1 1 


a. 
«oy 


- S refers t А 
fers to a summation of tests from one to а or eight. 


A ‘ 
Eus to the intercorrelations of all tests not including the 
Sul ity coefficients. 2° 
stituting in the formula, 
EEESJ U X —— —— 
= Varo F 191280900) = V27128 = 924 


(battery) 


The f 
igure 19.128 is the sum of all the intercorrelations as shown in 


Table XXVII 
h МЕ 
pute f Deteeited lion of Battery Reli 
the effectiveness ofa battery of tests i 
Propl iability of the battery. The use o 
Phecy formula 


ability. In order to com- 
tis necessary to determine 
{ the Spearman-Brown 


VPE WU = 
raf, Af 1 + (a = Dn 


reliabilit r ates the assumption that the average 
average S of the subtests in the battery is equal to the 
is таң ай possible intercorrelations between the subtests. When 
e En is not true, as is clearly evident from an inspection ol 
rown f em at hand (see Table XXVII), the use of the Spearman- 
he dr e will yield a spuriously high reliability coefficient for 
Ctwee ery. The formula really applicable is one for the correlation 
n sums of a series of tests not strictly comparable. he 


in 
Such 3 
1 an instance necessit 


19 
This ; 
Ror ا‎ from Kelley, T. L., ор. cit., for! 
onto iscussion of the meaning and use of these 
* cit., pp. 145-148, 162. 


mula (163), P- 213. $ 
terms, see Cozens, Frederick 


358 Theory and Practice of Test Administration 


correct formula for use has been adapted from that given by Kelley?! 


and may be written as, Я 
= 


ia a Se »rpP + Е ЛУТ, 
Sep Y, J| Seplp | = 
1 1 а a—a 


Sw S warpa 
1 1 


a 

S refers to the sum of tests from 1 to a, or in this problem 1 to 8. 

1 

w refers to the weight of each test, namely one. 

ryp refers to the reliability coefficients of the separate tests. 

rp, refers to all possible intercorrelations between the tests, not 
including the reliability coefficients. 


T 


Since all the weights are one, all “w’s” may be disregarded except 
a 

in Sw,” and, if we substitute the appropriate values, the various 
1 


terms become: 


a 
Sw,?ryp = .926 + .912 + .570 + .816 + .965 + .917 + .930 
1 + .850 = 6.886. 
2? — а ` 

SWpWarpa = the sum of all intercorrelations = 19.128 (see Table 
1 ХХУП). 


а 
Sw? =1+1+1+1+14+1+14+1 =8 
1 


Our battery reliability coefficient then is 


а N ie 6.886 + 19.128 
(5) =e +19128 = 99 
This is much higher than the average reliability coefficient of the 
various tests in the battery and indicates the fact that the length of 
the battery as against a single test increases reliability. 

Other Means of Computing Battery Reliability. The battery 
reliability coefficient may be very readily computed in another way it 
21See Kelley, T. L., op. cit., formula (148), p. 198. The adaptation has been fully 

discussed by Frederick W. Cozens, op. cıt., pp. 145-148, 163. See also Douglass, 


H. R., and Cozens, F. W., “On Formula for Estimating the Reliability of Test 
Batteries,” Jr. Educ. Psychol, XX (May, 1929), 569-377. 


Te бру» d 
st Construction in Physical Education 359 


the battery 

E. мау: has been administered twice to the same group. The 

the re ^ st test may be correlated with those on the second and 
ity coefficient determined in this manner. This may or may 


| not be Н 
a pra i i 
practical means of determination according to the scope of 


the testing pr В 

lin» ш E This method, though not particularly satisfac- 
If the Pcia | ا‎ prove trustworthy with a physical test. 

of Mica erie administered but once, а possible method 

chance halve s n ia ility would be to break the battery up into 

correlating "i + adding the scores of the odd-numbered tests and 
бе бк Кун ые {һе scores of the even-numbered tests. 

Кокше coefficient thus obtained тау be used as the i; in 

211 


The i Я 
rit thus obtained may be used as the reliability coefficient of 


oe battery. 
с ес method may be used, namely, by correlating scores 
applied to y qe forms (batteries) measuring the same quality and 
sidered b ы same group of pupils. In mental tests this is con- 
be used à ar the most trustworthy method.?3 This method may 
Indicated ie two different forms of a battery have been set up as 
he Pr E: computations on the previous pages. 

ard error iction of Physical Skill from a Test Score. (Stand- 
as to sae estimate.) In any test it should be a matter of concern 
given ee true ability can be predicted. That is, for any 
We wish is ual we have a particular battery score recorded — now 
Of the 5 oe how closely this score measures the true ability 
fi x ividual (the true criterion ability). The formula?* 
is for CoV ro — r? is the correct one for use. The notations in 

mula for the standard error of estimate should be made plain. 
adopted referring to true ability, 
he battery or test. The formula 
mated f a true criterion score OF ability 
Rin from a battery score equals the standard deviation of 
eka scores times the square root of the reliability coefficient of 
a nd eNe op. cit., formula (158), P- бы eee 

Kelley, T L and Stoddard, G. D., oP: cil, pp. 356, 361-365. 
- L., op. cit, formula (167), P- 214. 


esti 
Crit 


ice of Test Administration 


Theory and Pract 


360 


j - [£6 
FO 91— FF 8 р а te se Se т se |zt zr | zr loz I—- joz | 68©-0/© 
£9 пё— 6— и sc |s— |— fez |" [770 le |с joc |e ст | "|o |08 | 60-06 
z6 66 
OF Og z= or eo xe ict Sr st les е |e | (|9 69 11 (69 | 6zb-OIF 
TOI 
ST |SI— t= Fer | |ez т ez |8 It le lz F Е je je ос о |с [с | eros 
о © | ce с pr | sr je | jos jot je fe je т |e be | coros 
94 
ктп jet [cz z |e в ito jor > iF 9 о т jse i8 Ie [| | 680/7 
бот |с в js т Р |е fos jor j| ic seene "Үс jos |g O | 608-06» 
9 ST je | fos or в |e fiz ct 9 iz Тр ввс би 9 ig 628-018 
POT je АШ КЕ ea | L| Т=П spass рү үй ШИП 69-088 
og jor |9 c 

ob l li E inno enn ЕЕЕ Е dn S Сб 099 

pM |Р) ANR PE pm | ЕЛЕ |Р | £ ЕЕЕ | Pf | wool elt 
24007-7, 

oS] 20р FI 2б) аСТ 2б) [728774 sobv ny 
ll 


SKOG ‘1OOHOG ног YOINA( 


S9V ALIALLOV ‘IVOISAHG ONTUNS Vay Sauoog HALLE 


XIXX ЯПЧУ1. 


361 


Tze = "Wd ЛУ = 


7/9 = 


0019 = (oz xea s 


91268 = (oz 


(oz х 6 — 09Р = Jy — o91 99v 


29. 
021 
TS 


^ — 0cy = W — syt 3V 


— 00Р = W — £1 3y 


E wer wa Oss =2 096 = (oz x SOL) + оос = J — аст 8V 

3 c 

3 9619 Toor = (oc x oer) + 00F = Jy — se3eqv 

Ri 80 £6 ||9vZ 891 ||I9Z ост Iltis |68— 601 ||сосе |96®— 067 = N 
`з OF OTS lG Som WIS pne 697-097 
8 ert- 

3 +91- oc |p— jo- |t |r9 |t— م‎ | lost |0$— |9— s 682-022 
=> zz kr- |9— |с oor joc— ls- > jez Wz- le- |8 lose joz— |s— jet 602-062 
А sz lei |s— le |96 z- F— |9 lop loz— jc— lor loze |08— |F— |0 6©6-01© 
x 88— 

‘3S gej |9- |t (еп se |r- 4 gor js £= (ст |вт 61 lose |0ст— |£— OF 6%6-0©© 
g op g- с IT jes- £— let ғә jee- jz- [өт |" сс |с zit- |z— 99 692-092 
* E اف‎ 

* Mi P| А Set] el PIENEN Pl PIERA ee] Pe] See] HID 

S SILOIS- A 
x s£T 26у FI p of] 2бу szI 207 sobv рр 

S E 

v 

N 


362 Theory and Practice of Test Administration 


criterion scores minus the square of the coefficient of correlation 
between battery scores and criterion. Assuming the following values 
and substituting in the equation , 
44 

roo = .986 

ro .966 
саол = 44V/.986 — 933 = 10.1 


со 


MN I 


This means that in 68 per cent of the cases we can estimate an 
individual's true ability within 10.1 points. Considering that 
criterion ability embraces 8 elements, each having an average score 
of 50 and hence a mean criterion score of 400, the error involved i in 
using the short battery is relatively small. 

Another standard error of estimate is also desirable, namely, the 
probable divergence of an obtained battery score from its corre- 
sponding true score. This error has been referred to by some writers 
as the standard error of measurement and indicates how closely true 
battery scores may be estimated from obtained battery scores. The 
formula с% 1 = KTS — гу is used in а case of this kind.25 In 
this formula c; refers to the standard deviation of the battery and 
ri; to its reliability coefficient. Substituting the appropriate values, 
the equation becomes 52.1 
959 
села = 52.11 = 959 = 10.5 


In interpreting this standard error we may say that in 68 per cent 
of the cases true battery scores may be estimated from obtained 
battery scores within 10.5 points. 

Establishing Physical Activity Age Norms. It has been 
supposed that T-score values have been determined from one 
distribution or performance scale of all ages of boys in the Junior 
High School. That is to say, at the start there has been no separation 
into age groups and the mean of the battery scores (400 points) will 
represent what the average boy in the Junior High School can do. 
It is logical to assume that thirteen-year-old boys can perform better 
than twelve-year olds, that fourteen-year olds will average higher 
than thirteen-year olds, etc., and that if we wish to set up norms for 
the various ages we shall have to make special distributions for them. 
Table XXIX will illustrate the working material. The ages have 
been taken as 129, 13%, etc., because of the fact that these figures 


25Kelley, T. L., op. cit, discussion, p. 222. 


Test Construction in Physical Education 363 


represent boys who are scattered throughout the year m question. 

A few examples may be given to show how scores made by boys 
of various ages may be interpreted. Table XXX will assist in this 
interpretation. 

Example 1. Boy aged 15-0 makes score of 585. 

His physical activity age is about thirteen years and three months, 
and he is most certainly in the lower 10 per cent of the boys at his 
chronological age. 

Example 2. Boy aged 13-9 makes score of 442. 

His physical activity age is approximately fifteen years and three 
months. This boy is in the upper 15-20 per cent of the boys in his 
chronological age group. 

Example 3. Boy aged 13-6 makes score of 590. 

He is approximately at age in physical activity. 

Example 4. Boy aged 14-0 makes a score of 376. 

"This boy is retarded in his physical activity about a year but may 
still be included in the middle 50 per cent of the fourteen-year-old 
group. m» 

It should be pointed out that norms of performance, differing 
somewhat from those suggested here, have been set up for boys and 


girls in a variety of age-height-weight classification groups. ° 


TABLE XXX 
PHysICAL ACTIVITY Ace Norms 


Chronological N е үст 
is interpolated 
*16-5 491 
*16-0 480 
*15-9 469 : 
15-6 457 426-489 
*15-3 445 
*15-0 454 
ax 
*14-9 425 p 
14-6 412 385-441 
*14-5 p 
*14-0 402 
*13-9 397 , 
3 392 560-424 
13-6 Ec 
*15-3 5 
*13-0 Ea 
158 5 551-389 
12-6 e 55 
*12-3 55 
545 
Ux and Cozens, Frederick W.. 


Neilson, N. P. and Cozens, Frederick W., ор. cil., 
Trieb, Martin H. and Neilson, N. P., op. cit. 


364 Theory and Practice of Test Administration 


Preparation of a Manual for Use in Administering 
the Test Battery 

If the test or battery of tests is to be made usable for any but the 

' individuals who formulated it, a well-edited manual of directions 
should be worked out and published. This manual should contain 
the following items: 

Purpose of the Test. This account need not be long but it should 
state explicitly for what purpose the test battery may be used. 

The Basis for the Construction of the Test. This discussion 
will include a summary of the guiding principles used in the selection 
of material, preliminary studies for foundation elements and the like. 

A Description of the Test. Each of the tests used in the study 
should be described in enough detail so that they become readily 
understandable. Apparatus should be diagrammed or shown in cut 
form with dimensions specified. Directions to be given to pupils 
should be printed as should also certain directions for the smooth 

administration of the tests. If fields or courts are to be marked oft 
in a specified way, this information should be conveyed by means 
of diagrams. 

Method of Validation. The criteria for establishing validity 
should be briefly summarized and the degree to which the battery 
measures what it purports to measure should be set down. 

Reliability of the Test. Facts about the reliability of the 
battery as a whole and of each item in the battery should be re- 
corded, and information should be given as to the use of the test tor 
group or individual measurement. 

Standardization and Directions for Scoring. It is important 
that a statement be made regarding the process of standardization. 
Test norms established from data collected in Los Angeles are good 
only for Los Angeles and, although they may differ but slightly from 
some other community or city, they cannot be taken as norms to be 
used outside of that city. If material for the norms has been collected 
from all parts of the country, an account of this process should be 
set forth. 

Tables of norms should be presented for all tests and directions 
for scoring given. 

Use of the Results. The Manual of Directions should contain a 
discussion of the use of the results for purposes of classification, 
diagnosis, remedial teaching and the like. Examples should be 
given of each use of the test and all possible applications of the tests 
for teaching purposes should be pointed out. 


Test Construction in Physical Education 365 
Selected References 


Garrett, Henry E.: Statistics in Psychology and Education, Second Edition, 
а VII. New York, Longmans, Green and Company, 1945. Pp. xiv and 

5. 

This chapter discusses a number of applications of statistical techniques to 
tests and test results. 

GREENE, Harry A., JORGENSEN, ALBERT N. and GERBERICH, J. RAYMOND: 
Measurement and Evaluation in the Secondary School, Chapter V. New York, 
Longmans, Green and Company, 1945. Pp. xxvi and 670. 

Presents a brief but excellent discussion of the problems involved in the 
construction of standardized tests. 

Guitrorp, J. P.: Fundamental Statistics in Psychology and Education, Chapter 
XIV. New York, McGraw-Hill Book Company, 1942. Pp. xi and 555. 

"The chapter on the reliability and validity of tests deals with such problems 
as the importance of reliability and validity, methods of determining reliability, 
criteria of validity, combining battery items, item analysis and indices of item 
validity. 

Hutt, Cranx L.: Aptitude Testing. Yonkers-on-the-Hudson, New York, World 
Book Company, 1928. Pp. xiv and 536. 

Chapter VIII, “The Composition and Yield of Test Batteries," рр. 254-278, 

an excellent understanding of the idea of combining several 


gives the student 


test items into a battery. 
Chapter IX, "The Psychological Analysis of Occupational Behavior," pp. 


281-301, presents a general outline of the six steps involved in constructing a 
test battery. 

Chapter X, “The Assembling of a Trial Battery of Tests,” pp. 302-339, 
contains excellent material on "paper and pencil" tests and the problems 
connected with the use of apparatus. 

Chapter XI, “Administering the Preliminary Test Battery to a Trial Group 
of Subjects," pp. 540-375; Chapter XIII, “Selecting the Final Aptitude Battery," 
pp. 421-456; Chapter XIV, “Combining the Tests to Secure the Maximum Fore- 
casting Efficiency,” рр. 457-490. 

These three chapters will be very helpful to those students engaged in a 
problem involving the setting-up of a test battery to measure a particular 
quality. 

Kerey, Truman L.: Statistical Method. New York, The Macmillan Company, 
1924. Pp. xi and 590. ^ 

Section 84, pp. 502-310. When dealing with more than five variables in a 
problem involving partial and multiple correlation technique, it is highly 
important that students learn how to use Kelley's “Method of Successive 
Approximations.” It is a valid procedure and a great time-saver. 

Section 54, pp. 196-200, “Correlations of Sums or Averages,” is of great value 
in computing the reliability coefficient of a battery of tests. 

Rucu, G. M. and STODDARD, GEORGE D.: Tests and Measurements in High School 
Instruction. Yonkers-on-the-Hudson, New York, World Book Company, 1927. 
Pp. xix and 381. Я 

Part IV, “The Construction of Educational and Mental Tests,” рр. 301-375, 
gives four excellent chapters covering validation, setting up à criterion of 
validity, selection of items, experimental try-out of items, equivalent forms, 
derivation of norms and determination of reliability. 


CHAPTER XIX 
Program Organization and 
the Technique of Test 


Administration 


Organization of the Testing Program 


No categorical rules can be set for the organization of testing 
programs. Organization plans will be determined by program 
objectives. It is recommended at this point that the student review 
the material in Chapter I on the importance and use of measurement 
in the physical education program. It was there stated that testing, 
not an end in itself, serves as an integral phase of the teaching 
process. Further, testing per ve is one aspect of the total process of 
evaluation, which aims to ascertain the extent to which educational 
objectives are being achieved in order that the instructional program 
may be better adapted to meet pupil needs. Thus testing programs 
must vary with the structure and objectives of each school program. 
In addition testing programs are affected by the equipment and 
facilities of the school, the experience and preparation of the teacher, 
the general school attitude toward testing, the administrative pro- 
cedures of the school program, and many other factors. Several 
general policies, however, may be offered as guides for organizing 
and conducting testing programs. 

1. The testing program should be carefully planned. First, it must 
be kept in mind that the testing program should be planned in 
relation to the total program of evaluation. While measurement 
is the preferred means of evaluation, the need to utilize qualitative 
evaluative procedures as well must be recognized. Tests selected 
for the program must be considered in relation to the other evalu- 


366 


Program Organization and Test Administration 367 


ative procedures which will be used, the final purpose being to pro- 
p as much valid data and information as possible about the 
actors being appraised. 
Eo testing program implies careful planning far in advance. 
ice e program of activities for several years, one year, а term ога 
be poms planned and time schedules set up, tests to be used should 
Me included in the plans. Long term planning is essential to provide 
4а much cumulative information as possible about the student and 
th establish the machinery for progressively expanding the scope of 
"Ns testing program. For the school in which little has been done 
їп physical education testing a fully developed testing program 
E be installed in one school year. It involves a long process of 
earning testing techniques, developing pupil efficiency in test taking 
Procedures, organizing testing materials, installing a functional 
record system, and making ‘the other necessary administrative 
adjustments. 
The plans for the testing program must ў 
v ribility, since developments in the teaching situation will aflect 
s € need for and use of tests. Conversely, the results found in testing 
hay reveal need for change in teaching emphases or program plans. 
t ests serve their best function when the results cause the teacher 
9 rethink his objectives, analyze his methodology. and redirect his 
ша in terms of pupil needs. | КУ оу. 
XVI Tests should be selected on the basis of acceptable criteria. C ms 
Tel: I deals with the criteria of good tests. These criteria, va'ici*y» 
» lability, objectivity, administrative economy, use of norms, 
ра forms, and standardized directions, should guide the 
ection of tests used. и” , 
К 3. Tests should be selected with a definite purpose in mind. A ү 
зщ be selected to serve a precise function. Merely to use * " 
" Commended by experts or authority without first determining is 
Se to which it may be put in terms of program objectives represen à 
needless waste of time. Results of a test, however, should be use 
is as many different purposes aS possible. Seven uses of tests were 
cussed in Chapter I. Often a test can be used for several gue 
Bus results of a test used to classify students according toa | у. 
а be used to guide teaching ky ron for that pupil, as & 
Бе later degree of progress and the like. 
% After tests pasce dicia criteria have been selec d 
ould be administered and scored according to exact directions. 2А e 


allow for considerable 


sh 


368 Theory and Practice of Test Administration 


is gained if otherwise valid, reliable and objective tests are carelessly 
administered. Common faults include omitting items from test 
batteries, limiting number of trials, modifying equipment, or in 
other ways deviating from the original directions of the tests. While 
many teachers will be expected to modify available tests to meet 
local needs, and attempt to improve existing tests, these changes 
should be approached scientifically and with full knowledge of how 
the changes affect results. Tests are not improved, however, by 
random modifications by the teacher. 

5. The results of testing should be used. While this policy may 
appear too obvious for mention, failure to make use of test results 
is probably one of the most common faults in testing programs. 
Little educational justification exists for utilizing time for testing 
if results are not used in one of the many possible ways to improve 
the quality of the program or the student's adjustment thereto. 

6. Tests should be efficiently administered in order to consume as 
little teaching time as possible. No arbitrary standard can be set, but 
it is generally accepted that not more than ten per cent of the 
teaching time should be consumed by testing. The next section, 
The Technique of Test Administration, illustrates many devices 
which will aid in saving time in testing. Efficiency of test adminis- 
tration and the use of testing as a teaching device when feasible 
enhance the effectiveness of the testing program that can be con- 
ducted with the little time available in the average physical educa- 
tion program. 

7. Safety hazards should be kept to a minimum. The usual safety 
problems in physical education become magnified under testing 
situations. The general excitement, the tenseness of the pupils, the 
overmotivation and other factors dictate the need for exceptional 
concern with safety problems during test periods. Special attention 
should be given to safety precautions, such as checking facilities and 
equipment, using spotters, organizing students to avoid congestion 
in testing areas, and keeping tracks, throwing areas and the like 
clear of contestants and spectators. 

8. Results of testing should be preserved in cumulative form for each 
pupil. The limitations of a single test as a predictive device have 
been referred to several times throughout this text. Major decisions 
should not be made regarding a pupil on the result of one test. The 
greatest value accrues when a variety of data and information are 
gathered for each pupil and compiled on a cumulative record form. 


Program Organization and Test Administration 369 


9. The cooperation of the student should be sought in the testing 
program. Student cooperation can be stimulated by careful orienta- 
tion as to the purpose of the tests, selecting tests inherently inter- 
esting to pupils when possible, permitting student participation in 
selecting and administering tests, avoiding overtesting, interpreting 
8 to him, and by other good teaching techniques. Student 
i reque not only greatly facilitates effective test administration, 
d „also considerably adds to the worth of tests as instructional 

evices, and as sources of reliable data. 

10. Some testing, even though limited, should be a part of every 
The administrative problems of testing 


Physical education program. 
ineffectual 


cannot be underestimated. Limited time, large classes, 
Measurement tools, lack of teacher experience in testing, and many 
piner factors mitigate against extensive testing programs. These 
actors, however, represent general educational limitations, and 
should not be considered as justification for the omission of testing. 
A few tests carefully selected and administered are an excellent 
start toward the final goal of a carefully planned testing program 
Covering many aspects of program objectives. As a teacher increases 
his facility in using tests and his knowledge of available tests, the 
Scope of his efforts can be gradually extended. No program ot 
Physical education can be termed adequate which does not attempt 
to measure the effectiveness of its outcomes. А program so obviously 
limited by too large classes, short periods and poor facilities, which 
leads a teacher to say “testing is impossible in my situation," may 
be seriously questioned as to its educational value. Testing is even 
more important in such situations to evaluate actual pupil achieve- 
ment as a basis for critically determining whether or not the program 
is justified in terms of educational needs of pupils. In testing lies a 
Strong ally for the conscientious teacher for support in attempting 
O overcome the administrative limitations placed on physical 


ed : 3 H 5 
Ucation programs in many situations. 


The T. echnique of Test Administration 


to testing relate to factors of 


Many of the teacher objections 
attitudes. Careful 


excessive time consumption and adverse pupil: Я 

Administration of tests will do much to counteract these limitations, 
аз well as to increase the value of the results obtained. This section 
Outlines practical suggestions for effective test administration. 


370 Theory and Practice of Test Administration 


Preliminary Arrangements. The first thing to consider under 
the heading of Test Administration is an arrangement for getting 
the pupils to the examining room or field or to the place where the 
tests are to be conducted. Such an arrangement will vary according 
to type of school. Elementary and high schools may find it necessary 
to administer the test or battery of tests during the regular class 
period and after classes have been organized and students assigned 
to definite sections. In the case of a test designed to classify students 
this is not an entirely satisfactory arrangement. 

Suppose, for example, that we have a school situation in which it 
is desired to classify boys and girls on the basis of their performance 
in tests and in which only one teacher of physical education is 
available. In such a situation, the one instructor will have to look 
after three groups of students during one class period, the skilful, 
the average and the inferior groups. This means, of course, that he 
will have to divide his time amongst the three groups. If it were 
possible to classify them into groups before they are assigned to a 
regular period, we would have a much better arrangement in so far 
as physical education is concerned, though it might prove difficult 


administratively for the other subjects. However, if the principal , 


of the school is in accord with the advantages to be gained by classi- 
fication, a study period, library period, shop period or assembly hall 
period might be assigned to those not in the physical education class 
period. A hypothetical case may be cited. Grades 5 and 6 — 75 
pupils in each. 


Periods 30 minutes * Number 
Group А (Above Average and Superior students physically) 50 
Group В (Average) 50 
Group C (Below Average and Inferior students physically) 50 


These figures we will assume approximate twenty-five boys and 
twenty-five girls. Four room teachers are available as well as a 
special teacher of physical education, making 5 in all. 


^ i Shop for boys, Library period, 
Physical education | home ec. Sor girls study period 
2 teachers 2 teachers 1 teacher _ 
Ры ай DIL. oss А В п 
Period ТУ....... X B 


Period V......-- 


Program Organization and Test Administration 371 


. In large city systems where several instructors in physical educa- 
tion are provided for Junior and Senior High Schools, the general 
plan within each class period ought to be simple. Classify each class 
during the regular physical education period and divide into groups 
according to the number of instructors. 

The situation for a college and university is still less complicated, 


especially in a large institution. After the medical examination has 


been given to entering men and women in order to eliminate those 


not physically fit, appointments may be made for the test at so many 
to the hour. In order to afford ample opportunity for free periods, 
aside for this work. Азап example of such a 


several days may be set 
a typical college situation may be cited. ! 


scheme a procedure used in 

At the beginning of the semester all new entrants are informed 
that they must take the test and are given regular appointments 
just as for the medical examination. These appointments must 
necessarily be given during the first week of the semester and after 
the medical examination has been taken, in order that all those 
physically unfit may be eliminated. Since systematic class work 
cannot begin immediately in physical education on account of the 
fact that lockers and gymnasium suits must be issued, this time is 
given over by department members to the administration of the 
tests. Men are informed that they must wear gymnasium suits and 
tennis shoes. 


Sixty men report at the beginning of each hour and are divided 


immediately into groups of 10 each. It should be stated here that 
the test cards are used for purposes of roll call. Men are lined up in 
double rank (/. e., company front formation) and take their card as 


the instructor calls the name. 

Brace? gives the following explanation — “the procedure in 
giving the tests as а group test is to assemble the class in a gym- 
nasium, room, hall or play yard, line up the class, and by counting 
off by two's or by marching, or by any other method, arrange the 
pupils in as many double lines or columns of two's as the space 
permits. These double lines then face together. Scoring blanks and 
pencils are then passed out from the ends of the lines." 

Directions for Giving the Tests. As has been pointed out in 
the chapter on Test Construction, the directions for giving tests 
must be so explicit as to leave nothing to the imagination. If there 
are several ways of doing а stunt or exercise, definite instructions 
regarding the procedure to use must always be set down. If there 

1Cozens, Frederick W., op. cit., Chapter Ni: 
Brace, David K., ор. cik, p- 10- 


372 Theory and Practice of Test Administration 


are instructions to give a group, these should be set down in writing 
and read to the group so that the same instructions are always given 
each group. If demonstrations are to be given, they must be given 
to each group in the same fashion. If practice is allowed, the same 
number of trials should be given to all. A 
A large number of factors relative to equipment to be used in 
physical education tests will have to be taken into consideration. 
These may include: 


1. Size and height of bar for chinning. 
2. Distance apart of parallel bars for dipping. 
5. Size of rope for climbing. 
4. Method of checking time on half-lever. | E 
5. Kind of equipment to be used in ball throwing tests. 
For example, in a football throw it is much easier to throw a new 
ball than an old one, etc. \ j 
6. Size and type of target for various accuracy throwing tests 
and distances from the ground. 
7. Type of take-off board in jumping. 
8. Signal to be used in starting a dash. 
9. Method to be used in mounting to a horizontal bar. 
10. Height of bar in doing such a stunt as the bar snap for dis- 
tance. 3 


A sample set of test directions for a particular event may be used 
here to point out the care with which such material must be organ- 
ized. 4 
Event No. 21 — Pull-up (Chin). 

Equipment Needed: 


A horizontal bar either inside or out. Bar must be capable of 
adjustment to at least “hang high." Diameter of bar, 1% inches. 


Description: 

With ordinary grasp, knuckles to the face, the contestant places 
his chin over the bar at each pull-up and comes down to a straight- 
arm hang. The pull-up is repeated as many times as possible. 


3Good examples of test directions will be found in the tests of Brace, MacCurdy, 
Rogers and the achievement scales of Cozens, F. W., Trieb, M. H., and Neilson, 
N. P., Physical Education Achievement Scales for Boys in Secondary Schools. 
New York, A. S. Barnes and Company, 1956. ^ 

4Cozens, Frederick W., Achievement Scales in Physical Education Activities for 
College Men, p- 27. Philadelphia, Lea and Febiger, 1936. 


Program Organization and Test Administration 373 
Rules: 


1. Chin over bar and straight-arm hang must be observed. 

2. The feet must not be allowed to touch the floor. 

3. No rest is allowed between pull-ups. 

4. More than one trial may be allowed after a reasonable rest 
period. 
н 5. The contestant's performance is recorded as the number ot 
times he is able to pull-up (completely). No parts of a pull-up shall 
be recorded (as for example, X, X, etc.). 


Bu Principles in the Preparation of Instructions. 
cCall? sets forward the following guiding principles in the prep- 
aration of instructions: ” 


1. “Instructions should be as brief as is consistent with an adequate 
understanding of what is to be done." In other words avoid confusion 
by brevity, relegate long sets of instructions to the waste basket, 
but see to it that adequacy 15 obtained by careful forethought and 
experimentation. 

2. ‘Instructions should employ a demonstration and preliminary 
tes 1” It is often easier to demonstrate than to tell, and children can 
imitate better than they can follow spoken directions. Demon- 
Strations often secure better attention than directions. Suppose for 
example that we have a test of ability to mount a horizontal bar for 
time, and that the method of mounting is not prescribed. “Tn 
Mounting you may use any method you wish, knee mount, front pull 
It will be exceedingly helpful to have all of 


over, kip or uprise.” 
low the boys to pick their own method. 


these methods shown and al 
larly complicated it is wise to allow a trial, 
especially if the event requires some warming-up. Thus, in a test 
like the baseball throw for distance, preliminary throws should be 
allowed for warming-up the arm. Trial tests should be allowed in 
such events as the long dive or the frog stand, etc. Especially is this 
true if but one trial is to count. As a usual thing the best of three 
trials ought to be counted unless the event is fatiguing. Normally, 
in running tests one trial is sufficient unless the boy or girl falls down. 

3. * Instructions should be adapted to and uniform for all who are to 
be testea,’ An essential feature under this principle maintains that 
a student should not fail to perform to some degree because he did 


5McCall, Wm. A., ор. cit, pp. 80-90. 


If a stunt is particu 


374 Theory and Practice of Test Administration 


not understand instructions. Further, instructions should. not 
require material or a procedure which will not be available in all 
cases. For example, in throwing or putting tests, handedness must 
be taken into consideration. Instructions should not prescribe 
holding the ball or shot in the right hand. 

4. “The order of instructions should be the order of doing." This 
principle may have little chance for application in physical education 
tests. However, in such tests as the dodging test for college men, 
discussed later in this chapter, or a potato race, the examiner might 
work out helpful hints that could be called to performers during the 
progress of the tests. In the dodging test, for example, this might 
be taken care of to some extent by painting lines describing .the 
exact direction of the run. 

5. “Instructions should be broken into action units.” This principle 
applies particularly to written tests but should be given considera- 
tion when dealing with a complicated physical test involving several 
parts. ; 

6. “Instructions should equalize interest.” McCall’s general idea 
involved in this principle brings out the fact that a pupil will be more 
interested when he is "informed of the general purpose of the test 
and of the general method by which he is to be scored.” Thus, in 
such a test as dodging, described later, the examiner might say, “Tt 
you analyze almost any game or sport you can think of, you will 
come to the conclusion that the ability to change direction quickly 
is a very important factor in any physical activity. This test is 
designed to measure not only speed of legs, but the important ability 
to change direction quickly.” 

7. “Instructions to pupils should be accompanied by instructions to 
examiners.” In other words, the examiner should be told exactly 
what to tell pupils and how the test is to be applied. The layout of 
fields shown in this chapter illustrates a definite procedure which 
has been set forth for the benefit of the examiner, and for the speed- 
ing up of the measuring operation. Hints to the examiner which 
will aid him in the administration of the test, should be set off from 
instructions to students in some convenient form such as black type, 
italicizing, underscoring, etc. 

Preparation for the Testing. It should hardly be necessary to 
point out the importance of having fields, track, pits and stations 
in perfect order before the testing program begins. Yet, very often, 
little thought is given to this phase of the preparation. 


Pro 
4 "EE 
gram Organization and Test Administration 375 


As re E $ 
GPs _ ce factor, the necessity for careful preparation 
for rapid m of the utmost importance. An example of field layouts 
edo зрна of throws and punts for distance may be 
events so that ees a diagrams show a field marking for two 
A third dia а! еп теп тау Ье tested in from four to six minutes. 

лаа shows the layout for а dodging run. 

efinite m] ion of Examining and Recor 
of the Aene or printed instructions covering all phases 
meeting е poe be prepared and at least one organization 
may be giv so that a complete understanding of the entire process 
tions should b everyone by the organizer Or chief-of-staff. Instruc- 
Tay take ch е SO definite that any member of the staff of examiners 
i arge of any station and carry on in an efficient manner. 
all the necessity for uniformity in 
f directions. Assistants may 
ity of uniform procedure for 
plicit adherence 


e ^ " 
Ee ct must impress upon 
Sometimes "ri careful following © 
Obtainin s fail to realize the necess 
0 ES test results and for this reason im 
(rom ышы must be required. 
attery b se also that a definite order 
9f the sa e worked out so that there may 
i Vp sei onu à : 
o RE Vr Var of the reason for the test ought to be given 
What it "Жы at the outset so that they will understand in genera 
Expect, is they are striving to do and what outcomes they may 


of events or subtests in the 
bean alternation in the use 


Men's Staff from 


Los Angeles will 
en the exam- 


The 

1928 t following set of instructions used by the 

Serve E 1942 at the University of California at 

nin 5 an example of standardized directions to be giv 
8 force. 


Метр 

кеу DEPARTMENT OF PHYSIC 
Orde PJER лр Recanone E 
The”. Events IVING THE LASSIFICA 
Quarter асап card indicates the order . 
А беа н only after completing the other 6. Since groups 
eS other is time on each of the 6 events, these groups mu 
P OUPS go 5 the order listed on the card. Unless Dodging 1$ 

"оседи, Tom е: the Baseball Throw- 
“he re in Baseball Throw (see p- 376 for field layout) 
кй of the are marked off so that men throw from both ends of the 
rowers are called before the instructor and given these ins! 


6 
Co 
ze; 
ns, Frederick W., op. cit 


AL EDUCATION MEMBERS, ASSISTANTS, 


STANDARDIZED PROCEDURE IN 
TION TEST 


field. Five 


£ructions— 


376 Theory and Practice of Test Administration 


BASEBALL THROW FOR DISTANCE 


Throw both directions 


Fig. 21. Classification test field layout. 


Pro ( ) 
gram K zati 77 
rganization and Test Administration 3 


s 
\ 
m 
” 
1 
D 
П 
[ 
i 
1 
» 


TTGDEU scie eI Tee 


\ 
1 
= 
| 
! 
\ 
ke ----—-554ads--—--- Е 2 
——£ gards— 


2 
% . 
VEN AN Starting Rint 


Fi 
8. 22 % А 
rows ar Чань test. Instructions: start at point 
nd the course to point ИХ; continue over 
you have made two complete rounds. 


۸ 


“A,” then follow the 
unti 


the same course 


378 Theory and Practice of Test Administration 


“You will throw in pairs, each man standing behind the end-line when he throws. 
One minute will be given for warming up. Only three throws for distance will be 
allowed. Partners should get the throw for distance on the first bounce, immedi- 
ately spot the point of the hit, look to their right to read the distance to the nearest 
10 feet and estimate to the closest foot. The lines between the 10-foot numbered 
markers are 5-foot lines which will make the estimation easier. Call your partner's 
throw to him as soon as you have figured it. Remember that throws must be made 
from behind the end-line. When three throws each h 
immediately to the recorder indicating your best throw. 


Football Punt for Distance (see р. 376 for field layout) 

The lines are marked off so that men punt from both ends of the field. Five 
pairs of punters are called before the instructor and given these instructions — 
“You will punt in pairs, each man standing behind the end-1 
warming up punts are allowed and only three punts for 
Partners of men who have just punted should s 


ave been completed, report 
- Bring your ball with you. 


our partner. The 
lines having numbered markers are 10-yard lines. These are separated by 5-yard 


lines. After spotting the point of the hit, look to the right (remember that you 
ine. Then with the help of 


rest yard and call this 
to your partner. Then go behind the end-line and take your trial, 


“When three punts each have been completed, report immediately to the 
recorder indicating your best punt. Come in pairs and bring your ball with you.” 


Bar Snap for Distance 


An adjustable horizontal bar must be used so that the bar height from the floor 
or ground may be set at 4 feet 6 inches. This test can be conducted on the field if 
adjustable bars are available. Mats must be placed for a landing area, if the test 
is conducted inside. If conducted on the field, a sand pit should be used for a 
landing area. This pit must be well dug up and loose. 

The stunt consists in grasping the bar while standing on the floor or ground facing 
it; the body is swung underneath with the feet close to the bar; the feet are shot 
upward at an angle approximating 45 degrees, the back is arched and the con- 
testant lets go of the bar at the right moment to give distance, landing on his feet. 

For ease in administration, stretch a tape on the floor or ground with the zero 
end directly beneath the plane of the bar. By standing in the pit, the instructor 
will be able to call off distances rapidly to the nearest inch. Three trials are 
allowed and the best of these recorded in feet and inches. 
taken on the ground or floor from the plan 
where any part of the contestant’s body 


Standing Broad Jump 


No preliminary practice, Three trials all 
tape can be laid down with 
off very quickly. 

Dip (On Parallels) 
А recorder is not needed here. 
Men start from straight-arm Support at the end of the parallels and the number 


of dips is recorded as the score. The arm angle of the upper arm with the bar 


is required to be positive. In other words, the elbow must be higher than the 
shoulder. 


Measurements are 
e of the bar to the point nearest the bar 
touches the ground or mat. 


owed. Measurement toe to heel. The 
the zero end at the take-off board and distances read 


Program Organization and Test Administration 379 


Dodging 

Five 3-foot track lanes are laid out about 12 yards in length (see diagram, 
p. 577). Hurdles are placed as indicated. 

The run may be described as follows: 

Starting from point A the runner goes straight ahead, turns right at hurdle B, left 
at hurdle C, right at D and around E, coming to the far point of D, then following 
the same path back to A as was originally taken. 

Two complete round trips are made, starting at A and ending at A. Time is 
taken from the word “go” until the finish line is crossed. Only one trial is allowed 
unless a runner gets confused and runs incorrectly. The groups of 10 should jog 
over the course twice before any individual runs for time. 


Quarter Mile 
Each original “group of 10” runs as a group. Time is called to the nearest 
second as runners cross the finish line. After finishing, runners should walk and 
keep in the order in which they finished, reporting back to the recorder after they 
have walked about 100 feet. The recorder marks scores only to the nearest second. 
As soon as one group has finished, the instructor should immediately organize 


and run another group. 
Tennis Snoes Must Be Worn IN ALL Events 

Scoring the Results. Individual blanks or score cards are to be 
preferred over any other method in so far as the actual physical act 
of recording is concerned. Just who does this recording depends upon 
the method of administration of the test. 

Brace, for example, provides an individual sheet for each pupil 
and his partner does the scoring after noting whether there has been 
success or failure. Cozens provides a 5 X 5 card which each man 
carries with him as he reports to the various stations (see Fig. 25). 

In both of these tests the final score recorded is a sigma index 
score (on the even-step interval plan) computed on the basis ot 
records actually made in the tests (known as raw scores). For a 
complete discussion of the values of such scoring see Chapter XVI. 
If scale scores are to be used, tables of equivalents must be pro- 
vided so that the scale score may be readily secured at a glance. 

Section Assignments. If the test is a diagnostic one, section 
assignments should be made as soon after the completion of the test 
as possible. 

Two possibilities for making known such assignments may be set 
forth here. Suppose a case where the battery of tests is one made by 
special appointment outside of regular class hours. 

1. As soon as students have completed the tests they may take a 
shower, dress and report immediately to the physical education 
office. During the dressing period their scores have been computed 
by clerks, an examiner has diagnosed their weaknesses and their 


380 Theory and Practice of Test Administration 


WEB wren ills ect seem ERROR SE dO E D adus: DORE ei caeno e АШЫ, 
AMIGA Sa eot snis 
Field data Score | Weight | Final Classification 
p REMIS Жр! liess esc lcs sarei Do [со -.| Superior 496 up 
TOW... 
киш | e RR НИ |... то ..| Ab. Av. 599-495 
iar ON uncos [s ааа ое 20: Менен ..| Average 302-398 
Standing 
broad 
TULLE ES de) | to as ша емек [иу «Фа leonem оза ere ..| Blw. Av. 205-301 
DEB rst | eters eR amener lic one «B sss .. | Inferior 204 dn. 
Т0 RIDE |е rone] vn Dr sca 0: Marnie. АЗАБЫН 
Track 
Quarter 13 Games 
fnis loss deus АКИН ИИ Е grey bol Grane 
Total Boxing 
Wrestling 
——!_| Fencing 
Handball 
Tennis 
Basketball 
Golf 
Swimming 
Elective 


Fig. 23. University of California at Los Angeles, Department of Physical Education 
for Men. Classification Test Record Card. 


cards have been arranged in alphabetical order with assignments 
noted on them. While students are stil] in the gymnasium, they are 
enrolled in the proper section and another trip for assignment is not 
required. In this scheme, however, much clerical work must be 
completed in a comparatively short 
find that the pressure of this amount of 
In such a case: 

2. Students are asked to report the next day, note their assign- 
ment which has been posted on the bulletin board and then enrol] 
ill relieve the clerk at the 


office window of looking up assignments from a card file. He has a 


time and departments may 
administration is impossible. 


Program Organization and Test Administration 381 


best fits his schedule before he reports and will hence bance lees 
congestion at the enrollment window. 

" In either of the two methods cited there should be an instructor 
in charge of special adjustments. With large groups of students 
there are always adjustments which have to be made and special 
adjustment clogs progress at an enrollment window. 

_ When students are assigned to special work on the basis of a 
diagnostic test it is important that the instructor be given as much 
information as possible about the weaknesses of his students. Some 
arrangement should be provided for presenting this information to 
the instructor and it is suggested that the instructor receive a note 
of the individual's total score, classification, and scores made on 
particularly weak events. This note may be made on the student's 


enrollment card. 


Selected References 


Cozens, FREDERICK W., TRIEB, Martin Н. and Мешѕох, N. P.: Physical Educa- 
tion Achievement Scales for Boys in Secondary Schools, Chapter III, “Procedures 
in Testing Program,” pp- 14-52. New York, A. S. Barnes and Company, 1956. 
Pp. vi and 155. 
Here will be found samples of general instructions, organization, descriptions 
of events and rules in a large variety of physical education tests. 
Davis, Еглуоор С. and LAWTHER, Joux D.: Successful Teaching. in Physical 
Education, Chapter XXI. New York, Prentice-Hall, Inc., 1948. Pp. xiii and 617. 
In addition to a listing of factors governing the installation of a testing pro- 


ga standardized test. 


gram, are guides in selectin, 
1, pp. 80-90. New York, The Macmillan Com- 


McCatt, Wm. A.: Afeasuremen 
pany, 1939. Pp. xv and 535. 

The guiding principles in the preparation of test instructions seem to be 
particularly applicable to physical education. 

Netxson, N. P. and COZENS, FREDERICK W.: Achievement Scales in Physical 
Education Activities for Boys and Girls in Elementary and Junior High Schools, 
Chapter II, “How to Give the Tests," pp. 10-38. New York, A. S. Barnes and 
Company, 1954. Pp.x and 171. 

Descriptions of events and testing procedures offer a variety of samples for 
the administration of physical education ability tests. 

Ross, C. C.: Zfeasurement in Today's Schools. New York, Prentice-Hall, Inc., 


1947, Pp. xvi and 551. 

Chapter VII, “Steps in the Testing Program,” while dealing primarily with 
written tests, has much practical material on general organization and adminis- 
tration of testing programs which is equally applicable to physical education. 

Коси, G. M. and STODDARD, GEORGE D.: Tests and Measurements in High School 
Instruction. Yonkers-on-the-Hudson, New York, World Book Company, 1927. 
Pp. xix and 381. 

Part IV, “The Construction of Educational 
while dealing primarily with construction, poin 
the administration of the test. 


14 


and Mental Tests," рр. 301-375, 
ts out the necessity for perfecting 


CHAPTER XX 


Diagnosis 


Purpose of Diagnostic Tests. There is only one real major pur- 
pose of diagnosis in any field of educational measurement, namely, 
that of determining as accurately as possible the true condition of 
the pupil with reference to his capabilities and skills. Especially is 
this diagnosis important in physical education. When a pupil enters 
a new grade, transfers from one school to another, or is promoted 
from the elementary school into the junior high school, from the 
junior high school into the senior high school or from high school 
into college, his academic record goes with him. His future teachers 
and advisers know at least something of his former training along 
purely academic lines. His teacher of physical education knows little. 

It is highly important, then, if the teacher of physical education is 
to do his job in a scientific fashion, that he know the skills which the 
pupil has mastered, his capabilities, his strengths and his weaknesses. 
If it were possible for the teacher of physical education to watch 
this pupil for a period of weeks or months, that pupil might be 
placed satisfactorily for his own best good in the proper physical 
education section. But this scheme is impossible under our present 
arrangement of mass education. The teacher must know about the 
pupil in a very short time, in fact, immediately. Hence the necessity 
of a test which will immediately put the pupil into the proper classi- 
fication. Such a test may be termed a diagnostic test. It is a test 
which will assist the teacher in giving adequate instruction to the 
group or groups under his direction. 

A further purpose for a diagnostic test may be found in the medical 
and physical examination which should be given to pupils periodi- 

382 


Diagnosis 
383 


cally wl 5. 
Pe sine rd remain in the same school system and always at the 
B a = career in a new school system. The thing which 
questions: particularly here may be put in the form of a number of 

l. Is * В 
the Bn physically fit to engage in the normal program of 
i thie ша nn should his activities be restricted? Particularly 
and lung acti ion should careful consideration be given to heart 
tests of hea en A number of attempts have been made to formulate 
(that is iem condition which can be administered by the layman 
EE eive. D the point of view of the medical profession). Among 
80); (5) л а) Тһе Michigan Pulse Rate of Recovery Test (see p. 
Misa Halo T California Group Functional Test (see р. 81); (0) 
The effici est, particularly the adaptation by Tuttle (see р. 73). 
Where times of such tests has not as yet reached the point 
lagnostic ied و‎ them as actually having a high degree of 
Carts from "n" n general, these tests may separate the efficient 
Er епи he less efficient, but that is about all. Continued 
needed so к. ion along the lines of cardiac functional tests is greatly 
Which can b hat it may be possible to formulate tests of this sort 
with a com e used by the lay person to rate the fitness of individuals 
A аен high degree of accuracy. 

` attention i he pupil have physical handicaps which need special 
side of h in the field of corrective physical education, that is, out- 
with "Baie or lung condition? Can he enter а normal program 
e deficiencies and work for correction outside of the regular 


Class . 
per . . 
iod, or should he be required to devote his entire attention 


E Correction? 
c ter other questions of a similar nature face the 
po A Ed ris but are not in the field of discussion presented here. 
agnosis ж үре ы for the use of diagnos 
ifference dz efficiency of the teacher. Although there are drastic 
e s of opinion, the measurement of teacher efficiency by 
11 probability a valid procedure if al 
justly the efficiency of 


In, А 
E situati 
e zen ns must be considered. These may include any ог all of 
ing: 
а. Th 8: Я 
е previous preparation of the pupils in the two situations 
Records of performance of 


9u 
ght t 
© be approximately the same. 


384 Theory and Practice of Test Administration 


the two groups to be measured ought to be obtained at the beginning 
of the period under comparison. 

6. The two groups should show a very close agreement as to 
mental ability. 

c. The groups should be comparable as to age and grade. 

d. The tests employed should measure the content of the course 
of study covered. 

e. The equipment provided for physical education must be com- 
parable. That is, how can boys or girls learn to jump unless they 
have a pit in which to jump; how can they strengthen their arm and 
shoulder-girdle muscles unless they have bars on which to work and 
how can they perfect their hand-eye, foot-eye coordination unless 
they have balls to throw and kick? 

Requirements for Diagnostic Tests. Ruch and Stoddard! 
set forth the following stringent requirements before tests can lay 
claim to actual diagnosis: 


1. The school subject to be diagnosed must be broken into all of 
its important constituent unit skills or aspects, and each of these 
must be measured separately. 

2. Each of these units must be sampled widely enough (4. e., be 
covered by enough test items) that no important facts or skills are 
omitted. 

3. Each of these units (whether designated as “Parts” of the total 
test, or as separate tests) must be provided with separate norms for 
the interpretation of scores by units. 

4. The score yielded by each unit of the total test must be reliable 
enough to stand alone as a score on an individual pupil in contrast 
with group measurement. 

5. No tabulation of individual errors should be required in order 
to arrive at a diagnosis. 

6. The analysis into units should be carried far enough that each 
unit parallels a unit in the courses of study; /. e., represents some one 
teaching unit. 

7. The diagnosis should suggest the remedial or corrective pro- 
gram which should follow the diagnostic testing. 


A careful study of the seven criteria listed will convince any one 
of the rigid character of the requirements for a diagnostic test. One 
of the tests in physical education? which may be used for diagnostic 


1Ruch, G. M. and Stoddard, С. D., Tests and Measurements in High School 
Instruction, p. 19. Yonkers-on-the-Hudson, New York, World Book Company, 
1927. 


Diagnosis 385 


purposes was built around an analysis of college freshmen though 
it falls somewhat short in No. 2 and No. 6, named above. 

The physical fitness tests of Rogers and the motor ability tests of 
Brace are valuable for survey purposes or for general classification 
of boys and girls but are not particularly adapted to individual 
diagnosis, or to specific weaknesses in individuals. 

Individual Classification. For the sake of illustration in the 
satisfaction of the criteria listed, Cozens’ test of General Athletic 
Ability which, however, is a study limited to college men only, may 
be cited. Though it might be adapted to high school boys, no claims 
are put forward for any other group than the one mentioned. 

__ 1. The subject of general athletic ability has been broken up into 
its important constituent aspects. This was done by means of a 
composite judgment secured from a large number of people in the 
profession, who decided that the constituent elements of general 


athletic ability are: 


(a) Arm and shoulder-girdle strength. 

(6) Arm and shoulder-girdle coordination. 

(c) Hand-eye, foot-eye, arm-eye coordination. 
(d) Jumping or leg strength and flexibility. 
(e) Endurance (sustained effort). 

(/) Body coordination, agility and control. 
(g) Speed of legs. 


à Each of these eleme 
eight individual tests, 
best single measure of the element. 


nts is measured separately by from three to 
one being chosen in the final battery as the 


ot satisfied except in a round-about way. 
c ability includes 41 separate tests 
ly one for each element. However, 
on .967, showing it to be 


2. This criterion is n. 
The criterion of general athleti 
but the final battery includes on 
the final battery correlates with the criteri 
a very valid measure. 

3. Each of the separate tests in th 
norms (scale scores) for the interpreta 
field. 

4. The reliability of the separate t 
be readily used for individual as we 


e final battery is provided with 
tion of data collected in the 


est items is quite high and may 
ll as group measurement. 


2Cozens, Frederick W., op. cit. 


386 Theory and Practice of Test Administration 


Reliability 


coefficient 
Пе Dipion-parallelS-« c а сеа кыйк арта» emendum .926 
2. Baseball throw fori distance wreck oos SR EE EAE .908 
3. Football punt for distance. 6.66205 esee hz ne .816 
Au Standing broad Junip.- „фә: їй» eri trien зари» eee eco .965 
бе tractermleSEUIocuruse жа носна stre eels HX casi HEE .917 
6. Bar Sap fos distance oesi ES йай a "раа memi dantes .925 
73 Dodging.« escis cass sa oye ours So ео qus rre ined .850 


5. Since times and distances are transposed into scale score units, 
a glance at the scale score tells the examiner how far above or below 
the mean the student's score in a particular event lies. 

6. The test falls short on this criterion in that two tests represent 
a teaching unit. For example, the Dip and the Bar Snap are used to 
represent Gymnastics (Heavy Apparatus) which includes not onlv 
apparatus work but also such activities as climbing, tumbling, 
vaulting and the like; the Baseball Throw and Football Punt repre- 
sent game elements; the Standing Broad Jump and Quarter Mile 
represent track and field events; Dodging is included in games, track 
and field events and such activities as handball, tennis, boxing, 
basketball and the like. 

7. A low score in any event or test immediately shows weakness 
in a particular activity or activities and the remedial teaching pro- 
gram follows. Since the scale scores for the various subtests in the 
battery run between 0 and 100 or three standard deviations on each 
side of the mean, a classification has been set up showing five 
divisions over this range, each division represented by 1.2 (96) 
standard deviations. 

A score of 39 or below indicates weakness in a particular unit of 
the battery and an assignment is made on this basis. Men who are 
average or above in all units are allowed to elect any desired activity. 


Class Scale-score values 
Superior 80 up 
Above average 60-79 
Average 40-59 
Below average 20-39 
Inferior 19 and below 


The weak students are the special concern of the department, how- 
ever, and particular attention is given to their proper grouping. 


D agnosis 
387 


E pes of examples of diagnosis may serve to show the method 
E wu Э ет = events 2 and 3 and is assigned 
П and $ 5 W ic embraces practice and instruction in 
К canî ^ ч саш play in four sports — Touch Football, Play- 
ener all, Volleyball and Basketball. 
Pails ay ign shows weakness in events land 6. He has inade- 
his солан does not know how to handle his body by 
тий e is assigned to Gymnastics. й 
E ui — shows weakness in events 4 and 5. He has inade- 
ul Ы strength and coordination and needs training and instruc- 
cem track and field events. 
Eu P— shows weakness in even 
rapid] Ses in changing direction quickly an 
col vhile changing direction. He is assigne 
May, or basketball according to his own desires. {г 
Rot RI M show general weakness in à combination of tests 
- 3 above. They are assigned first to those activities in which 
SEL s particularly weak and later to those in which they are less 
is E ome practice and experience in sizing ир physical situations 
Gr essary before the examiner becomes expert 1n his diagnosis. 
ei RUP Classification. A further value of the method of classi- 
Eu indicated above lies in its power to subdivide а large group 
puted ing to general ability. The total score of individuals is com- 
that P. a classification arranged as previously indicated, except 
e classification is based upon general and 


t 7 only. He has poor 
d in carrying his body 
d to handball, boxing, 


not specific ability. 


Total scale-score values 


Class 
Superior 496 and up 
Above average 399-495 
Average 302-398 
Below average 205-30 
Inferior 204 down 


a us suppose a class of fifty students in heavy apparatus with 
ollowing class distribution: 
ЗЕ 15 students 
"Nue ` 20 students 
15 students 


Average, ue 
elow average 


nferior, , i 


388 Theory and Practice of Test Administration 


"Those men showing average ability will be able to progress faster 
than those whose general ability is below average, and the below 
average group will progress faster than the inferior group. Hence 
the division into groups may serve as sections within the class. A 
more difficult sequence of exercises and stunts can be set for the 
average group than for the group below average and the same holds 
true as regards the below average and inferior groups. A progressive 
course can thus be arranged to conform with the needs of individuals 
and a procedure approximating a scientific method established. 

The same situation may be worked out in a course in games. 
Instruction in the elements can be given to sectioned groups and 
teams organized within the section if the entire class is large or 
arranged according to general ability within the team if the class is 
small. Total score of the entire team may be used as a basis for 
estimating the approximate ability of that team. 

Predicting Athletic Success. A test which measures general 
athletic ability may be used to discover raw material for inter- 
scholastic or intercollegiate competition. Those individuals who 
test in the superior class are certainly prospects for membership on 
an athletic team. It must be remembered, however, that there are 
other factors besides mere physical ability and capacity which have 
a significant contribution to make in deciding upon athletic team 
membership. Among such factors may be mentioned: 


1. Mental alertness and attitude. 

2. Determination and ability to withstand the hard knocks and 
the grind. 

5. Susceptibility to expert instruction. 

4. Cooperative team effort. 


Statistical Procedures in Diagnosis — Individual Items 
versus Group Averages. As has been previously intimated, test 
batteries may have a sufficient degree of reliability for purposes of 
group measurement or for survey purposes and yet single items in 
the battery may not be reliable for individual measurement. To 
illustrate, a definite case may be cited. 3 

An element of General Athletic Ability, namely, “Hand-eye, 
foot-eye, arm-eye coordination," together with the tests used to 
measure this general quality is shown in Table XXXI. 


3Taken from Cozens, Frederick W., op. cit., p. 145. 


Diagnosis 389 


It can be readily seen that because of the low reliability of some 
of the single events, the standard error of measurement is almost as 
large as the standard deviation of the distribution. In the case of 
Baseball Throw for a Strike, a standard error of measurement of 
nine means that in 68 per cent of the cases the individual’s true 
score lies somewhere between his obtained score and nine points on 
either side. Thus, if the student’s score happened to lie at the mean, 
50 points, all we can say about the score is that his true score, in 68 
per cent of the cases, will lie somewhere between 41 and 59. 

Though many of the individual tests are not reliable, the fact of 
combination enables us to place considerable faith in the composite 


TABLE XXXI 


SuowiNc Errect or Test UNRELIABILITY ON Diacnostic POWER 


S.D. in C Meas, = 
Rel. i T-scal. ae 
Hand-eye, fool-eye, arm-eye eli E pir m na 
coordination x 
.850* 5.9 2.4 
| —— 
Football punt for distance. ..----+++ .816 10.0 4.5 
Basketball throw for goal.....--++++ .621 10.0 6.2 
Baseball throw for strike... -+ -+++ .189 10.0 9.0 
Bouncing ball in air with bat.......- .456 10.0 7.5 
Football pass for accuracy. . -++--++ .515 10.0 7.0 
Ballcatch.... ie .570 10.0 6.6 
Drop kick for accuracy... -+s+ .600 10.0 6.5 
Throwing for speed and accuracy. . - - 4197 10.0 9.0 


*Computed by the Spearman formula for the correlation between sums or 


averages. 


score for the entire battery. Thus a battery reliability will be 
materially greater than any of its component parts. We can say ot 
this battery, for example, that, if a student's score is 50, the chances 
are 68 in 100 that his true score will lie between 47.6 and 52.4. 
Compare this range to that cited above, 41—59. 

The whole point of the discussion is simply this — that in diag- 
nosing strengths or weaknesses of individuals by means of single 
tests, and not batteries, we must make sure that the single test 
elements are sufficiently reliable to enable us to put confidence 1n 


our diagnosis. 


390 Theory and Practice of Test Administration 
Selected References 


Bram, Grenn Myers: Diagnostic and Remedial Teaching in Secondary Schools. 
New York, The Macmillan Company, 1946. Pp. xi and 422. 

Considers the function of diagnosis, available measurement tools, and their 
use in relation to remedial teaching. 

Greene, Harry A. JORGENSEN, ALBERT N. and Gersericn, J. RAYMOND: 
Measurement and Evaluation in the Secondary School, Chapter XIII, pp. 289- 
296. New York, Longmans, Green and Co., 1943. Pp. xxvi and 670. 

This chapter discusses the diagnostic functions of tests and their use in 
remedial teaching. 

Ross, C. C.: Measurement in Today's Schools, Chapter XIII, рр. 364-597. New 
York, Prentice-Hall, Inc., 1947. Pp. xvi and 551. 

Discusses the nature of educational diagnosis, its value, the techniques of 
diagnosis, and remedial procedures. 

Косн, С. М. and STODDARD, GEORGE D.: Tests and Measurements in High School 
Instruction, Chapter II, pp. 18-27. Yonkers-on-the-Hudson, New York, World 
Book Company, 1927. Pp. xix and 381. 

Seven requirements which characterize a genuine diagnostic test are listed 
and pertinent comments are made on each. 


APPENDIX 


Tables of Measurement 


Table Title 


Areas of the Normal Curve.......-. et 


The Transmutation of an Order of Merit into Units 


of Amount or "Scores"... . eer mmt 


Table for Transmuting an Order of Merit into Unit 
Scores on the Basis of 10 or 100 (Range = 60). ... 


T-scores Shy rad TEE АХ gars ds aby Аалы с каН ea. 


Physical Growth Record for Boys, 4 — 11 Years of Age.. 


Physical Growth Record for Boys, 11 – 18 Years of Age. 


Physical Growth Record for Girls, 4 — 11 Years of Age. . 


Physical Growth Record for Girls, 11 — 18 Years of Age. 


XXXII 


XXXVIII 


XXXIX 


392 Theory and Practice of Test Administration 
TABLE XXXII 


In the Table below the total area of the curve or the number of cases in the theo- 
retical distribution is taken arbitrarily as 10,000 because of the ease with which 


fractional parts of such an area may be calculated.! The column ~ gives distances 
g 


in tenths of с measured off on the baseline from the mean, and distances in hun- 
dredths of « are indicated in the corresponding columns .01, .02, etc. 
Example: What area of the curve is to be found between the mean and a point 


1.55¢ above or below the mean? Reading ~ as 1.55, the area is given in the table 


g 
as 4394 or 43.94 per cent. 
z 00 01 02 .03 04 .05 06 07 08 09 
o 
0.0 0000 0040 0080 0120 0160 0199 0239 0279 0319 0359 
0.1 0398 0438 0478 0517 0557 0596 0636 0675 0714 0753 
0.2 0793 0832 0871 0910 0948 0987 1026 1064 1103 1141 
0.3 1179 1217 1255 1293 1331 1368 1406 1443 1480 1517 
0.4 1554 1591 1628 1664 1700 1736 1772 1808 1844 1879 
0.5 1915 1950 1985 2019 2054 2088 2123 2157 2190 2224 
0.6 2257 2291 2324 2357 2389 2422 2454 2486 2517 2549 
0.7 2580 2611 2642 2673 2704 2734 2764 2794 2823 2852 
0.8 2881 2910 2939 2967 2995 3023 3051 3078 3106 3133 
0.9 3159 3186 3212 3238 3264 3290 3315 3340 3365 3389 
1.0 3413 3438 3461 3485 3508 3531 3554 3577 3599 3621 
1.1 3643 3665 3686 3708 3729 3749 3710 3790 3810 3830 
3849 3869 3888 3907 3925 3944 3962 3980 3997 4015 
4032 4049 4066 4082 4099 4115 4131 4147 4162 4177 
1.4 4192 4207 4222 4236 4251 4265 4279 4292 4306 4319 
1.5 4332 4345 4357 4370 4383 4394 4406 4418 4429 4441 
1.6 4452 4463 4474 4484 4495 4505 4515 4525 4535 4545 
1.7 4554 4564 4573 4582 4591 4599 4608 4616 4625 4633 
1.8 4641 4649 4656 4664 4671 4678 4686 4693 4699 4706 
1.9 4713 4719 4726 4732 4738 4744 4750 4756 4761 4767 
2.0 4772 4718 4783 4788 4793 4798 4803 4808 4812 4817 
2.1 4821 4826 4830 4834 4838 4842 4846 4850 4854 4857 
2.2 4861 4864 4868 4871 4875 4878 4881 4884 4887 4890 
2 4893 4896 4898 4901 4904 4906 4909 4911 4913 4916 
2 4918 4920 4922 4925 4927 4929 4931 4932 4934 4936 
2. 4938 4940 4941 4943 4945 4946 4948 4949 4951 4952 
2. 4953 4955 4956 4957 4959 4960 4961 4962 4963 4964 
2. 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 
2. 4974 4975 4976 4977 4977 4978 4979 4979 4980 4981 
25 4981 4982 4982 4983 4984 4984 4985 4985 4986 4986 
3. 4888.9 4986.9 4987.4 4987.8 4988.2 4988.6 4988.9 4989.3 4989.7 4990.0 
3 4990. 


4999.52 


4999.68 
4999.966 
[4999.997 


Ошо bonon AwWNHO bwon хо 
A 
о 
о 
л 
N 


UAA WOWWW WOW 


1Pearson, Karl, Tables for Statisticians and Biometricians, London: Cambridge 
University Press, 1924. 


ت ت — 


Tables of Measurement 393 


TABLE XXXIII 
Tug TraNsMUTATION OF AN ORDER оғ MERIT INTO UNITS OF AMOUNT OR 
“Scores” * 
Let R represent the rank in the Order of Merit, and V the number of ranks. 
Then from the formula, Per cent position = 100(R — 5), find the per cent 


N 
position, and from it the score. 

Per cent Score Per cent Score Per cent Score 
.09 9.9 22.32 6.5 85.51 3.1 
.20 9.8 23.88 6.4 84.56 5.0 
.52 9.7 25.48 6.5 85.75 2.9 
.45 9.6 27.15 6.2 86.89 2.8 
.61 9.5 28.86 6.1 87.96 2.7 
.78 9.4 50.61 6.0 88.97 2.6 
.97 9.5 32.42 5:9 89.94 2.5 

1.18 9.2 54.25 5.8 90.85 2.4 
1.42 9.1 56.15 5.7 91.67 2.5 
1.68 9.0 58.06 5.6 92.45 2.2 
1.96 8.9 40.01 5.5 95.19 2.1 
2.28 8.8 41.97 5.4 95.86 2.0 
2.63 8.7 43.97 5.5 94.49 1.9 
5.01 8.6 45.97 5.2 95.08 1.8 
3.45 8.5 47.98 5.1 95.62 1,7 
5.89 8.4 50.00 5.0 96.11 1.6 
4.58 8.5 52.02 4.9 96.57 1.5 
4.92 8.2 54.05 4.8 96.99 1.4 
5.51 8.1 56.05 4.7 97.57 1.5 
6.14 8.0 58.05 4.6 97.72 1.2 
6.81 79 59.99 4.5 98.04 1.1 
7.55 7.8 61.94 4.4 98.32 1.0 
8.55 Ta 65.85 4.5 98.58 19 
9.17 7.6 65.75 4.2 98.82 .8 
10.06 7.5 67.48 4.1 99.05 AL 
11.05 7.4 69.39 4.0 99.22 .6 
12.04 7.5 71.14 3.9 99.59 .5 
13.11 7.2 72.85 3.8 99.55 E 
14.25 7.1 74.52 5.7 99.68 .5 
15.44 7.0 76.12 3:6 99.80 .2 
16.69 6.9 77.68 3.5 99.91 1 
18.01 6.8 79.17 5.4 100.00 .0 
19.39 6.7 80.61 3.3 

20.93 6.6 81.99 3.2 

I 


*See Hull, Clark, “The Computation of the Pearson r from Ranked Data," 


Jr. of Applied Psychol., VI (1922), 585. 


394 Theory and Practice of Test Administration 


TABLE XXXIV 


TABLE FOR TRANSMUTING AN ORDER OF MERIT INTO Unit Scores on THE Basis 


or 10 or 100 
Range = бо 
Per Per Per Per 
cent Score cent Score cent Score cent Score 
.05 9:9 7.42 7.4 52.40 4.9 94.15 2.4 
.07 9.8 8.32 7.8 54.79 4.8 94.82 2.8 
SL 9.7 9.28 7.2 57.15 4.7 95.45 2,2 
AT 9.6 10.32 TA 59.50 4.6 95.99 ea 
23 9.5 11.45 7.0 61.81 4.5 96.49 2.0 
29 9.4 12.66 6.9 64.08 4.4 96.95 1.9 
.37 9.3 15.96 6.8 66.30 4.5 97:35 |18 
.48 9.2 15.34 6.7 68.46 4.2 97.70 117 
.59 9.1 16.80 6.6 70.57 4.1 98.02 1.6 
“21 9.0 18.57 6.5 72.60 4.0 98.51 1.5 
‚85 8.9 20.01 6.4 74.57 3.9 98.56 1.4 
1.05 8.8 21.73 6.5 76.46 5.8 98.78 1.3 
1.22 8.7 25.54 6.2 78.27 5.7 98.97 1:2 
1.44 8.6 25.45 6.1 79.99 5.6 99.15 ПЕН 
1.69 8.5 27.40 6.0 81.65 3.5 99.29 1.0 
1.98 8.4 29.45 5.9 85.20 5.4 99.41 9 
2.30 8.5 51.54 5.8 84.66 3.3 99.52 .8 
2.65 8.2 55.70 5.7 86.04 $.2 99.63 Ey 
3:05: | 81 35.92 5.6 87.34 Zal 99.71 6 
5.51 8.0 58.19 5.5 88.55 5.0 99.77 xD 
4.01 7.9 40.50 5.4 89.68 29 99.83 4 
4.57 7.8 42.85 5.3 90.72 2.8 99.89 3 
5.18 7.7 45.21 5.2 91.68 2:7 99.95 ‚2 
5.87 7.6 47.60 5.1 92.58 2.6 99.97 zl 
6.61 7.5 50.00 5.0 93.39 2.5 100.00 .0 


Lables of Measurement 395 
TABLE XXXV 
“T-ScorEs” 

S.D. Per S.D. Рег S.D. Per S.D. Per 
value cent value cent value cent value cent 

0 99.999971| 25 99.58 50 50.00 75 0.62 
0.5 | 99.999963|| 25.5 99.29 50.5 | 48.01 75.5 | 0.54 

1 | 99.999952| 26 99.18 51 46.02 76 0.47 
1.5 | 99.999958| 26.5 | 99.06 51.5 | 44.04 76.5 | 0.40 

2 99.99992 | 27 98.95 52 42.07 77 0.55 
2.5 | 99.99990 | 27.5 98.78 52.5 | 40.13 77.5 | 0.30 

3 99.99987 | 28 98.61 53 38.21 78 0.26 
5.5 | 99.99983 | 28.5 | 98.42 53.5 | 56.32 78.5 | 0.22 

4 99.99979 | 29 98.21 54 54.46 79 0.19 

4.5 | 99.99973 || 29.5 | 97.98 54.5 | 52.64 79.5 | 0.16 

5 99.99966 || 50 97.72 55 50.85 80 0.15 

5.5 | 99.99957 || 30.5 | 97.44 55.5 | 29.12 80.5 |0.11 

6 99.99946 || 31 97.13 56 27.45 81 0.097 
6.5 | 99.99932 || 51.5 | 96.78 56.5 | 25.78 81.5 | 0.082 

7 99.99915 || 32 96.41 57 24.20 82 0.069 
7.5 | 99.9989 32.5 95.99 57.5 | 22.66 82.5 | 0.058 

8 99.9987 95.54 58 21.19 85 0.048 
8.5 | 99.9985 3 95.05 58.5 19.77 85.5 | 0.040 

9 99.9979 54 94.52 59 18.41 84 0.054 
9.5 | 99.9974 34.5 95.94 59.5 17.11 84.5 | 0.028 
10 99.9968 35 95.32 60 15.87 85 0.023 
10.5 | 99.9961 35.5 92.65 60.5 14.69 85.5 | 0.019 

11 99.9952 36 91.92 61 15.57 86 0.016 
11.5 | 99.9941 36.5 91.15 61.5 12.51 86.5 | 0.015 
12 99.9928 37 90.32 62 11.51 87 0.011 
12.5 | 99.9912 37.5 89.44 62.5 10.56 87.5 | 0.009 
13 99.989 38 88.49 63 9.68 88 0.007 
13.5 | 99.987 38.5 87.49 63.5 8.85 88.5 | 0.0059 
14 99.984 39 86.45 64 8.08 89 0.0048 
14.5 | 99.981 39.5 | 85.51 64.5 7.35 89.5 | 0.0059 
15 99.977 40 84.13 65 6.68 90 0.0052 
15.5 | 99.972 40.5 82.89 65.5 6.06 90.5 | 0.0026 
16 99.966 4l 81.59 66 5.48 91 0.0021 
16.5 | 99.960 41.5 80.25 66.5 4.95 91.5 | 0.0017 
І 99.952 42 78.81 67 4.46 92 0.0015 
17.5 | 99.942 42.5 77.54 67.5 4.01 92.5 | 0.0011 
18 99.931 45 75.80 68 3.59 93 0.0009 
18.5 | 99.918 43.5 74.22 68.5 5.22 95.5 | 0.0007 
19 99.903 44 72.57 69 2.87 94 0.0005 
19.5 | 99.886 44.5 70.88 69.5 2.56 94.5 | 0.00045 
20 99.865 45 69.15 70 2.28 95 0.00054 
20.5 | 99.84 45.5 67.56 70.5 2.02 95.5 | 0.00027 
21 99.81 46 65.54 71 1.79 96 0.00021 
21.5 | 99.78 46.5 | 63.68 71.5 1.58 96.5 | 0.00017 
22 99.74 47 61.79 72 1.39 97 0.00015 
22.5 | 99.70 47.5 59.87 72.5 1.22 97.5 | 0.00010 
25 99.65 48 57.95 75 1.07 98 0. 00008, 
25.5 | 99.60 48.5 55.96 73.5 0.94 98.5 | 0.000062 
A 99.53 49 55.98 74 0.87 2 p 0 0 

5 | 99.46 49.5 51.99 74.5 7 ie 0.000029 

ИСАН ПЕВА АЕ 


396 Theory and Practice of Test Administration 
TABLE XXXVI 


Puysicat GROWTH Recorp For Boys? 
4-11 Years of Ace 


AGE 
4 4b 5 sb 6 c т 8 в 9 of 10 wok u 
SER EEs 
Score H Frans 
59 | HEIGHT 59 


52 > 


NÉ 
"2 ance 2 = H Aa 
МЕ 144 
J | 
i 


Ze WEIGHT] 
و‎ 
89 4j 5 5 6 SOT ug. 8 ЗР 9 9b 10 4H 


3Joint Committee on Health Problems in Education of the National Education 
Association and the American Medical Association. Washington, D. C.: 
National Education Association, or Chicago, Illinois: American Medical, 


Association. 


Ра 58 
E HEEE sae з 
| 


Lables of Measurement 397 


TABLE XXXVII 


Puysicat Growt Record ron Boyst 
11-18 Years of Age 


AGE 
wb ıa gel as Bi 14 Gab 5 i 6 16р 17 Ug 

INS 5 2 2 Ug 18 

7 | HH E Ns 
70 J БЕШ 
@ 1 70 
i: HEIGHT, шиш 
68h io a E 68 
Sm Ir] Ha "H | | Li ier 
66 ti d T : TUNE |®° 
65 } it st ra, MODERATELY TALL Es 
Z| 4M 1 1 

Se p ez _————164 
63 ^ 4 [ — 1 BENE. 
де PATI OnE | |_| MODERATELY SHORT ZONE -62 
e ll NL ] i 6! 
60 [— -- АМ! m zs eas EX 60 
D | TE mE Hss 
" | = LBS. 
57 | 102 + T 165 
56 f O 1 = 160 
39 ? AES — 155 
83 Ti = mm i 150 
E T IP rH EE 145 
= í EE 140 
5! i р i 7—1 ө: Сан! zw ZONE 135 
LBS. LACE 10“ Zi EEE MODERATELY H 130 
128 EE 17 EIE ies 
120 P LI | E “120 
us | - : —— і 15 
UE 1 LE t - = UE по 
195 = SO DERATELY TIGHT Z 105 
100 ZT н 1016 + Е Е 100 
95 Са Е 1 ы 1-1 = 95 
Ms Т a ---- z Е 90 
95 КОЕ 4 ——— 2 
s0 F | 1 i : 1 50 
75 
БЕ aseko 


398 Theory and Practice of Test Administration 


TABLE XXXVIII 


PuvsicAL GROWTH RECORD ron Girts® 
4—11 Years of Age 


AGE 
7$ в 8 9 95 10 wl и 


IN: п INS. 
62 LILI - 62 
| HH | Tas 
59 HEIGHT mn T1 з | 
| Ра 1 
57 - 57 
im | 12-15 ‘aut ná Bs i 
55 55 
54 | EE ATE Y Я 54 
53 | ‘oe т 70N! 53 
52 ОЁ 52 
5! = 0.2 51 
50 бу G08 Les, 
48 < 1 É = по 
hy сай P d Ра 
p ah zu -jA 106 
19 Án 102 
45 КЗ = 
44 ER 98 
2s 70 94 
а ont 90 ў 
m |= 86 
38 jov 82 
37 T 78 
LBS Ж ir 74 
72 ai, 
68 t 00 T 
64 Е s E 
Е 62 Р, 
60 ON i 
56 АЛОЕ | ATELY uem p 1 
52 zi mon 29 
- 50 
s. e 70NE a um 46 ۵ 
VER Jaz E I ae 7 
зв 
36 ON WEIGHT ) 
32 e Е SH i 
30 
28 m П П 1 1 П 
4 4j 5 5b 6 eb 7 7j в в 9 зі Юю wj II 


Tables of Measurement 399 


TABLE XXXIX 


PHYSICAL GROWTH RECORD FOR GirLS? 
11-18 Years of Age 


AGE ; 

0 w ize з 3è 14 eb 5 f 162 16 ı6 ат т} 18 
INS| EBs 
74 NS 
73 n 
2—1] z 72 
ul n 
И? HEIGHT [y TALL 20 m 
2 [Е DERE 69 
68 
67| E 

Е, 
а т ZONE. 66 
HOR: 
65 © ATELY Н 65 
64 iv 10 MODER! e 
Bos А = 65 
B 3 EY 
6l " n 
2 066 9 = = 180 
28 е E = 175 
2 то 
57 Е z = 
56 at 70E 165 
55 $Н0 — iE 160 
з D "d 155 
55 K yt 150 
Е 00ЁР 145 
LBS. 140 
135 T 
130 a 130 
125 a 10° 30NE 5 
120 "n mu 120 
5 = i u5 
m Aen 
по M z uo 
105] 105 
100 
100 
95 E 1 = 95 
90 an? 90 
85 we 85 
8 H 80 
75 Е WEIGHT 75 
"ED 70 
70 LIGH Е p 
65 
60) = den 
15 14 4: I5 15$ 16 16 17 1 
Wn 12 wb 15 138 TE 


Авплтү, athletic, general, 144 
test for the college man, 156 
tests of, 89 
motor, general, 144 
tests of, 144 
ACH Index of nutritional status, 52 
Achievement scales, 
for boys and girls in elementary 
and junior high schools, 105 
boys in secondary schools, 106 
college men, 107 
secondary school 
college women, 108 
in basketball skills. 197 
field hockey skills, 199 
motor fitness events, 108, 111 
soccer skills, 202 
speed swimming, 208 
speedball skills, 202 
track and field events, 212 
volleyball skills, 214, 215 
wartime swimming events, 209 
Achievement tests, athletic, 89 
in activities for physical education 
teachers in training, 102 
Adams, E. G., 140 
Administration of tests, 525 
Administrative economy, 555 
Age aim charts, 91 
Age-height-weight, 
classification, 89 
normative tables, 51, 596 
use of, 51 
Aiming test, 161 
Alden et al. motor ability test for 
college women, 152 


girls and 


AND AUTHORS 


INDEX OF SUBJECTS 


CITED 


All-around athletic performance, 215 
Allen, F. C., 258 
Anderson, T. W., 146 
weighted strength tests, 152 
Andover physical fitness testing pro- 
gram, 179 
Annett rhythmic capacity test, 200 
Anteroposterior posture, 46 
Anthropometric charts, 40 
Anthropometric measurement, 32, 39 
Anthropometry, development of, 17, 39 
Appraisal of instructors, 
methods, and materials, 12 
pupil progress, 7 
Archery, 195 
Armed Services physical fitness tests, 
171 
Ataxiameter, 162 
Athletic ability, general, 144, 156 
achievement tests, 55, 89 
badge tests, 90, 190 
handicapping, 116 
index, 95 
performance, all-round, 215 
quotient, 104 


Bapce tests, athletic, 90, 190 
Badminton tests, 
knowledge, 220 
skill, 194 
Balance tests, 165 
Baldwin, B. T., 19 
Barach’s cardiac functional test, 67, 84 
Barringer’s cardiac functional test, 
24, 68 
401 


402 


Baruch committee on physical medi- 
cine, subcommittee on physical 
fitness, 167 

Baseball tests, 

knowledge, 221 
skill, 194 

Basketball, achievement scales, 258 
rating scales, 258 
tests, knowledge, 221 

motor ability, 197 
skills, 195 

Bass's balance tests, 163 

Bassett, Glassow and Locke's volley- 
ball skill tests, 215 

Beall's tennis tests, 192, 210 

Behavior rating scales, 38, 244 

Bennett's test of diving, 240 

Benton's measures of ability to learn 
dance movements, 202 

Best-fit index, 118 

Bilhuber's general 
test, 160 

Blanchard's behavior rating scale, 245 

Bliss' study of progression, 96, 192 

Blocks test, 159 

Blood ptosis test, 64 

Body build or type, 56 

Body mechanics, measurement of, 43 

Bookwalter, K. W., 121 
motor fitness indices, 177 

Bountz's soccer skill tests, 204 

Bovard, J. Е, 139, 142 

Boyer's survey test in health educa- 
tion, 231 

Brace's basketball skill tests, 192 
football achievement test, 200 
scale of motor ability tests, 145, 

147, 158 
Iowa revision of, 147 

Brady’s repeated volleys test, 215 

Brouha's step test, 76, 178 

Brown's ice hockey knowledge test, 222 

slill test, 200 
Brownell's health information tests, 
230, 231 
scale for measuring anteroposterior 
posture, 46 

Buchanan's speedball tests, 205 

Buck's methods of testing response to 
auditory rhythm, 202 

Build and stature, 

indices of, 54 
studies involving, 56 
types of, 58 


motor efficiency 


Index of Subjects 


Butler, L. K., 86 
Byrd's health attitude scale, 250 


CALIFORNIA, achievement scales, 104 
classification studies, 117 
decathlon, 95 
group functional test, 81 
physical fitness pentathlon, 176, 191 
Camp Archery Association, 193 
Capacity, and endurance tests, 159 
motor, tests of, 35, 144 
vital, 182 
Cardiac functional tests, 25, 33, 62 
principles of, 62 
typical comments on, 83 
Cardiovascular tests, 23, 53, 62 
Carpenter, A., 117, 121, 133, 135, 137 
general motor ability and capacity 
test for first three grades, 150 
weighted strength tests for elemen- 
tary school children, 133 
Central tendency, measures of, 262 
Chinning, 129, 136 
Circulatory fitness tests, 79 
City College of New York program of 
health and physical fitness evalua- 
tion, 179 
Clarke's functional physical fitness 
test for college women, 77 
Classification of pupils, 9, 34, 114 
history of problem, 114 
indices for, 114 
comparison of, 120 
computation of, 288 
five-variable problem, 290 
Cleveland physical ability test for 
boys, 96 
Clevett's test for allaround athletic 
performance, 215 
Colleges, development of physical 
education testing in, 27 е 
Collins and Howe's balance test, 165 
measurement of organic and neuro- 
muscular fitness, 185 
stairs test, 160, 165 
Column or bar diagram, 299 
Conformateur, 44 
Construction of tests, 339 
Coordination tests, 161 
finer muscle adjustments, 161 
large muscle adjustments, 162 
Correlation, 276 
computation of, 277 
from a scattergram, 278 


Index of Subjects 


Correlation, meaning of, 276 
partial and multiple, 285 
ratio, 285 
Cottle and Moore's test of health 
awareness, 251 
Cozens, F. W., 29, 105, 107, 108, 111, 
142, 155, 158, 176, 179, 195, 
197, 204, 208, 211, 585 
achievement scales, 104 
classification index, 117 
fall decathlon, 215 
general athletic ability for the 
college man, 156 
grouping of college men, 119 
leap meter, 159 
„strength and general 
ability, 134 
Crabtree's obiective test for riding, 240 
Crampton, C. W., 25, 85 
blood ptosis test, 64 
Criteria for selecting tests, 527 
for establishing validity, 541 
Criterion score, securing an adequate, 
548 
Crook's scale, 46 
Crow and Ryan's health and safety 
education test, 251 
Cubberley, H. J., 195, 197, 204, 208, 211 
Cureton, T. K., Jr., 59, 85, 86, 157 
motor fitness tests, 175, 182 
multiple rating scales, 241 
swimming test for beginners (rota- 
tional method), 206 
jntermediate and advanced, 207 
test for endurance in speed swim- 
ming, 208 
of antero-posterior posture, 48 
of foot fitness, 50 
Curve, normal probability, 259 
application of, 302 
properties of, 296 
uses of, 296 


athletic 


Dance, appreciation test, 221. See 


also Rhythm tests. 
Decathlon, California, 95 

Detroit, 95 

fall, for college track squads, 215 
Dennenholz, S. O., 231 
Detroit decathlon, 95 
DeWitt, R. T., 155, 136, 157 
Diagnosis, 8, 582 
Diagnostic tests, purpose of, 582 

requirements for, 384 


403 


Diagram, column or bar, 299 
Dipping and  chinning, 
method of scoring, 129 
Directions for administering tests, 
preparing the manual of, 564 
Discontinuous measures, 258 
Discrete measures, 258 
Diving rating scales, 259 
Doscher's first aid examinations, 250 
Dunder's multiple strength indices, 151 
Duplicate forms, 556 
Dyer's backboard test of tennis ability, 
211 
basketball motor ability test, 197 
Dynamic balance test, 165 
Dynamometer tests for endurance, 159 


McCloy’s 


Eporen’s test of ability and progress 
in basketball, 195, 197 
tests to determine progress in tennis, 
211 
Educability, tests, motor, 144 
Efficiency index, 158 
Ehrlich, G., 150 
Elbel, E. R., 86, 156, 258 
Endurance, tests of, 159, 208 
Energy index, 67 
Englehardt and Schwegler's variation 
of the Sargent jump, 158 
Ergograph, 25, 126, 148 
Espenschade, A., 146, 147, 148 
Evaluation, defined, 4 
importance of, 6 
steps in process of, 4 
tools of, 4 


„Evaluative procedures, importance of,7 


types of, 4 
uses of, 7 
Even-step interval scoring plan, 310, 
317 
Everts, E. W., 155 
Fatt decathlon for college track 
squads, 213 
Fatigue test, 71, 159 
Field hockey, achievement scales, 199 
tests, knowledge, 222 
skills, 198 
Fit to fight test, 175 
Fitness, measures, earlier, 189 
recent, 171 
motor, defined, 167 
tests, 171 
physical, 167 
defined, 167 


404 


Flarimeter, 80 
Foot, measurements, 50 
speed test, 164 
Football tests, tackle, 200 
touch, 199 
Force-meter, 139 
Form diagnosis sheets, 240 
Forsythe and Rugen's health knowl- 
edge test, 229 
Foster's physical efficiency test, 66 
Four-point classification plan, 115 
Franzen's health education tests, 228 
nutritional status test, ACH index, 
52 
French's knowledge tests for 
activities in the major curriculum, 
226 
field hockey officials, 226 
sample rating forms, 240 
Frequency, distribution, 252 
polygon, 297 
Friermood's basketball tests, 196 
Fundamentals of motor performance 
for secondary school girls, 97 


Gatton’s test, 126 
Garfiel's motor ability test for college 
women, 151 
Gates-Strang health knowledge test, 
227 
General athletic ability, 144 
test for the college man, 156 
motor ability, 94, 144 
and capacity tests for the first 
three grades, 150 
indices as classifiers, 121 
multiple strength indices of, 151 
tests, 146 
motor capacity, 144, 149, 158 
motor efficiency test, 160 
qualities, measurement of, 35, 144 
Gerrish's force-meter, 139 
Gill-Schrammel physiology test, 232 
Gire, E., 146, 148 
Gold's diet and dental health test, 230 
Golf, 222 
Grading, 7 
Graph, percentile, 500 
Graphical methods, elementary, 296 
Graybeal’s motor ability tests, 153 
Green, E. L., 86 
Grisier's knowledge test of field hockey 
rules, 222 ч 
Group physical condition tests, 80 


Index of Subjects 


Grouping of students, 114 
college men, 119 
high school girls, 119 

Guidance, 8 

Gymnastics, 199 


Hanrrs, tests, 227, 234 
Harvard step test, 76 
Hathaway, С. J., 135 
Havlicek, F. J., 136 
Health information 
tests, 227 
inventories, 232 


and knowledge 


Heart tests, functional, 25, 62. See 
also Cardiac functional tests. 
Heath-Rodgers knowledge and skill 
tests, 
in playground baseball, 194, 
221 


in soccer, 203, 225 
Height-weight-age normative tables, 
51, 596 
Hemphill's information tests in health 
and physical education, 224 
Henry, F., 85, 140 
Hewitt's achievement scales in war- 
time swimming, 209 
tennis knowledge test, 224 


Hill's test, 158 
Histogram, 298 
Historical sketch, development of 


measurement in physical education, 
17 
Hitchcock, E., 19, 39 
Hockey, field, achievement scales, 199 
tests, knowledge, 222 
skills, 198 
ice, tests, knowledge, 222 
skills, 200 
Homogeneous grouping of students, 114 
Howe, and Collins' balance test, 163 
measurement of organic and neuro- 
muscular fitness, 185 
stairs test, 160, 165 
and MacEwan's posture measure- 
ment, 47 
and Powell's motor ability tests, 156 
Humiston's motor ability test for 
college women, 154 
Hyde's archery scales, 195 
Ice hockey test, knowledge, 222 
skills, 200 
Illinois high school physical condition 
test, 180 


Index of Subjects 


Increased increment scoring plan, 510, 
318 
Information tests, 37, 219 
in health, 227 
in physical education, 220 
Indiana physical fitness test, 177 
University motor fitness indices, 177 
Indices for homogeneous grouping, 120 
motor ability, 121 
strength, 121 
measuring physical efficiency, 180 
of stature and build, 54 
Instructional methodology, 
tests as, 12 
Instructions for giving tests, guiding 
principles in the preparation of, 575 
Instruments for measuring postural 
conditions, 45 
Towa physical fitness battery for college 
women, 178 
revision of the Brace scale of motor 
ability tests, 147, 158 


use of 


JENKINS’ motor achievements of chil- 
dren, 96 

Johns’ health practice inventory, 230 

Johnson’s physical skill tests, 147, 158, 
202 

Jones, H. E., 137 

Judgment scores, transmuting, 505 

tables for, 595, 594 
Jump test, Sargent, 158, 158 
Junior pentathlon program, 105 


Karpvovicn, P. V., 156 
Kellogg's dynamometer, 21, 126 
Kilander's health knowledge test, 229 
Kistler s methods of classifying pupils, 
121 
Knighton's soccer knowledge test, 225 
Knowledge and information tests, 57, 
219 
construction and use of, 220 
in health knowledge, habits and 
attitudes, 227 
in physical education activities, 
220 
Korb's comparograph, 47 


Larson, L., 79, 156 
dynamic strength test, 121, 131 
motor ability tests, 122, 158 
organic efficiency test, 78 
static dynamometrical strength test, 
151 


405 


Lauritsen's technique of measuring 
sportsmanship, 245 

Leap-meter, 159 

Lemon, E., 201 

Los Angeles achievement expectancy 
tables, 94 

Ludlum, F. E., 56 


McCatt’s T-scores, 515 
McCloy, C. H., 155, 137, 139, 146, 147 
athletic handicapping, 116 
quotient, 104 
behavior rating scale, 244 
classification index, 116, 125, 140 
general motor ability test, 150 
motor capacity test, 149, 158 
method of scoring chinning and 
dipping, 129 
scoring scales, 104 
test of present condition, 78 
McCurdy, J. H., 
condition test, 65, 86 
physical capacity index, 141 
test, 141 
test of organic efficiency, 78 
MacEwan-Howe posture  measure- 
ment, 47 
McKenzie, R. T., 18 
Marking, 7 
Martin's resistance test, 127, 156 
Mean, 262 
Measurement, defined, 4 
early development of, 17 
modern developments in, 31 
need for, 14 
uses of, 7 
Median, 265 
Metcalf's standards, 100 
Metheny, E., 148 
Meylan's cardiac functional test, 65 
test for grading in physical educa- 
tion, 98 
Michigan pulse rate test, 80 
Midpoint, 255 
Miller, W. A., 86 
Minnesota motor ability tests for 
college women, 155 
physical education knowledge tests, 
225 
Mode, 267 
Money's tests for evaluating basket- 
ball players, 195 
Mosso's ergograph, 23, 126, 148 


406 


Motivation, 10 
Motor ability, 144 
tests, 94, 145 
value of selected, 158 
capacity. 144 
tests, 149 
educability, 144 
fitness, 
defined, 167 
tests, 171 
Mott-Lockhart's table tennis test, 212 
Multiple rating scales of physical 
fitness, 241 
Murray's dance appreciation test, 221 


NATIONAL Amateur Athletic Federa“ 


tion physical efficiency standards, 
101 
Collegiate Athletic Association div- 
ing rating scale, 239 
freshman physical efficiency 
test, 184 
Section on Women’s Athletics, 
National Officials! Rating 
Committee tests, 226 
rating of sports officials, 241 
standards of achievement for boys 
and girls, 109 
in motor fitness events, 112 
Naval Aviation Training Division 
fitness appraisal program, 173 
swimming standards, 209 
Navy standard physical fitness test, 173 
Neher's health inventory, 232 
Neilson, N. P., 105, 107, 108, 111, 117, 
142, 176, 179, 195, 197, 204, 208, 211 
Neuromuscular control tests, 159 
New York State physical fitness stand- 
ards, 97, 217 
Newton motor ability test, 156 
Normal probability curve, 259 
application of, 302 
some properties and uses of, 296 
Norms, use of, 334 
establishing physical activity, 362 
Nutritional status, 51 


OBERLIN college test, 101 
Objectivity, 555 

Ogive curve, 300 

O’Neel’s Study of behavior traits, 244 
Oppenheimer's scale, 180 

Oregon motor ability test, 152 


Index of Subjects 


Organic efficiency test, 78 
and neuromuscular fitness in 
college women, 185 
Orleans and Sealy’s health knowledge 
test, 232 


Pace, C. R., 4, 5 
Pack test of exercise tolerance, 77 
Pass or fail method of scoring, 510 
Pearson product-moment method of 
computing correlation coefficient, 277 
Pedograph, 44 
Pedorule, 44, 50 
Percentile graph, 300 
Percentiles, 267 
Philadelphia public schools age aim 
charts, 91 
Phillips’ badminton knowledge test, 221 
Photographs, anthropometric, 43 
Physical ability tests, development 
of, 25 
pentathlon, 11 
aptitude test, 102 
capacity test, 141 
efficiency, 66 
index, 133 
indices for measuring, 30 
test, 102 
for college freshmen men, 184 
women, 184 
fitness, 56 
assessment of, 241 
defined, 167 
index, 202 
self rating of, 241 
types of measures, 170 
growth records, 
for boys, 4-11 years of age, 398 
11-18 years of age, 397 
for girls, 4-11 years of age, 398 
11-18 years of age, 399 
performance levels for high school 
girls, 110 
test of man, 138 
Picking up paper test, 164 
Pignet's formula, 181 
Posture measurement, 45 
special instruments for, 43 
subjective ratings, 49 
Posturemeter, 44 
Powell and Howe's motor abilitv 
tests, 156 
Power tests, 137 


a.‏ مم د 


Index of Subjects 


Prediction of, 

ability in gymnastics, 197 

{о learn dance movements, 202 

athletic success, 588 

physical skill from a test score, 559 
Present condition test, 78 
Probable error, 271 

of the mean, 373 

Program score cards, 242 
Promotion, 7 
Pryor's width-weight tables, 54 
Pulse-ratio test, 75 
Pupil, progress of, 7 
Pursuit pendulum, 162 
Pursuitmeter, 161 
QUARTILE deviation, 272 
Quartiles, 272 


Rance, 268 
Rating forms, 240 
scales, 38 
activity, 258 
behavior, 244 
construction and use of, 255 
defined, 254 
program score cards, 242 
Rational athletics for boys and girls, 92 
Regression, 281 
Reilly's rational athletics for boys and 
girls, 92, 190 
Reliability, 529 
degree of, 552 
determining of, for a battery, 358 
effect of test length on, 331 
in individual and group measure- 
ment, 332 
influence of, on validity, 532 
of a difference between means, 274 
of various measures, 272 
significance of, 330 
Research, 13 
Rhythm tests, 200 
Richards' efficiency tests, 91 
Riding rating scale, 240 
Rodgers, E. G., 98 
and Heath knowledge and skill tests 
in playground baseball, 
194, 221 
in soccer, 203, 223 
Rogers, F. R., 22 
athletic index, 95 
classification of students, 121 
physical fitness index, 129, 182 


407 


Rogers, F. R. (Continued) 
short strength index, 129, 155 
strength index, 121, 128, 153, 135, 

141 

Rooks’ health knowledge test for college 
freshmen, 229 

Rotational method swimming test, 206 

Rump, A. H., 131 

Russell and Lange volleyball 
tests, 215 


skill 


SAMPLING, 272 
Sargent, D. A., 20, 21, 125 
anthropometric chart, 40, 41 
contribution to anthropometry, 40 
jump test, 158, 158 
further studies on, 139 
physical test of a man, 158 
Schwegler and Englehardt 
variation of, 158 
test of speed and endurance, 138 
Schaufele's soccer skills tests, 204 
Schmithals and French's achievement 
tests in field hockey, 198 
Schneider's cardiovascular test, 24, 69, 
85, 84 
Schuettner's scheme for stimulating 
interest, 99 
Schwartz's knowledge and achievement 
tests in girls’ basketball, 198, 221 
Schwegler and Englehardt's variation 
of the Sargent jump, 158 
Scoliometer, 44 
Score cards, 
check list for survey of health and 
physical education programs in 
secondary schools, 245 
elementary and secondary school 
health and physical education, 
245 
for Y.M.C.A. physical education 
programs, 245 
for physical education programs 
for physically handicapped chil- 
dren, 244 
secondary school physical educa- 
tion programs for boys and girls, 
242 
Scoring scales, 55, 104 
Scoring tests, "methods of, 309 
division into classes, 5 
even step interval, 310 
increased increment, 310 
minimum standards, $12 


408 


Scoring tests, pass or fail, 310 
standard scores, 314 
success or failure, 511 
T-scores, 315 
Scott's badminton knowledge tests, 220 
skill tests, 194 
‘general motor ability tests, 155 
sample rating forms, 240 
swimming knowledge test, 224 
tennis knowledge tests, 224 
Seashore rhythm tests, 200, 202 
Seashore, H. B., 165 
Springfield beam-walking test, 165 
Sectioning students into homogenous 
teaching units 147, 148, 158 
Sefton's knowledge test On source 
material, 226 
Shaw and Troyer’s health knowledge 
tests, 250 
Sheldon's somatotypes, 58 
Short screen test, 175 
Sigma Delta Psi, 99 
Slater-Hammel, A. T., 86 
Snell’s physical education knowledge 
tests, 225 
Soccer, achievement scales, 204 
tests, knowledge, 223 
skills, 203 
Social efficiency, 246 
Somatotypes, 58 
Southworth’s health 
attitudes test, 231 
Speedball, achievement scales, 204 
tests, skills, 205 
Sport technique, early work in measure- 
ment of, 190 
undamentals for measuring, 189 
specific experimental contribu- 
tions, 190, 192 
tests, 37, 189 
Springfield beam-walking test, 163 
posture measurements, 48 
strength tests, 132 
Stairs test, 160 
Standard, deviation, 268 
computation of, 269 
meaning of, 270 
error of estimate, 284 
of the mean, 273 
standard deviation, 273 
scores, 514 
Standardized directions, 337 
Stansbury’s motor ability indices, 122 
strength index, 121, 132 


knowledge and 


Index of Subjects 


Static balance test, 163, 202 
Statistical methods, elementary, 251 
Stature and build, 

indices of, 54 

studies involving, 56 

types of, 58 
Step interval, 253 


Stepping stone test of dynamic balance, 
164 


Strength, dynamic, 131 
general athletic ability and, 134 
index, 133 
indices as classifiers, 121 
of general motor ability, 131 
prediction of total, 150 
static dynamometrical, 13] 
testing, development of, 21, 35, 124 
recent research on, 135 
tests, 124 
weighted, for college women, 133 
for elementary school children, 
133 
for high school boys, 132 
girls, 132 
Subcommittee of Baruch committee on 
physical medicine, 167 
Swimming tests, knowledge, 225 
skills, 206 


TABLE tennis, 212 
Tapping test, 161 
Target test, 162, 164 
ms pack test of exercise tolerance, 


Tennis tests, knowledge, 223 
skills, 210 
Test, administration, 325 
irections for giving tests, 571 
organization of assistants, 373 
of program, 366 
policies for, 366 
Preparation for, 373 
technique of, 369 
construction in physical education, 
339 
assembly of trial 
tests, 345 
combining tests, 350 
determination of battery re- 
liability, 357 
of quality to be measured, 
339 


battery of 


elimination of tests 347 


Index of Subjects 


Test, const. in phys. ed. (cont.) 
establishing physical activity 
age norms, 362 ` 
experimental conditions, 549 
multiple correlation of battery 
with criterion, 554 
prediction. of physical skill 
from test score, 559 
preliminary try-out of tests, 
545 
preparation of manual for 
use in administering test 
battery, 564 
scoring, 549 
securing adequate criterion 
score, 548 
reliability of simple tests, 
547 
selection of final batteries, 549 
of test items, 546 
setting up criteria for estab- 
lishing validity, 541 
analysis of content of 
courses of study, 342 
combined judgments of 
experts, 542 
correlation with previ- 
ously validated tests, 
544 
determining social use- 
fulness, 544 
increase of accomplish- 


ment with successive 
ages, 545 
validity, | measures of 


success, 343 
use of rating scales, 545 
typical problem, 545 
Testing in colleges and universities, 
development of, 27 
in public and private schools, 26 
Tests, criteria for selecting, 327 
"Three hole test, 161 
Tools of measurement, 250 
Tracing test, 161 
Track and Field, 212 
Transmuting judgment scores, 505 
tables for, 393, 594 
Trieb, M. H., 105, 107, 108, 111, 142, 
176, 179 
Troyer, M. E., 4, 5, 250 
Trusler-Arnett health knowledge test, 
232 


409 


T-scores, 515 
table of, 595 
Tuttle's pulse-ratio test, 75 


Unitep States, Army, 175 
Army Air Forces, 172 
Army Specialized Training Divi- 
sion, 175 
Field Hockey Association, 226 
Military Academy physical eff- 
ciency test, 102 
physical aptitude test, 102 
175 
Naval Aviation Training Division, 
swimming standards, 209 
Navy standard physical fitness 
test, 175 
Office of Education, physical fit- 
ness tests, 174 
Universal dynamometer, 126 
University of California, physical effi- 
ciency test, 185 
Illinois motor fitness tests, 175 
plan of proficiency examinations, 
105 
Minnesota physical education knowl- 
edge tests, 225 
Oregon motor ability test, 152 
Wisconsin volleyball skill tests, 215 
U-tube manometer test, 71 


VALIDITY, 527 
criteria for establishing, 541 
Van Dalen, D., 140, 182 
Vanderhoof's soccer skill test, 205 
Variability, measures of, 267 
Vertical jump, 138 
Vickers, V. S., 146 
Victory corps program, 112 
Vital capacity, 182 
Volleyball, achievement scales, 214 
skill tests, 214 
Voltmer and Watts’ rating scale of 
player ability in basketball, 239 


WacNER's tennis knowledge test, 223 
skill test, 211 
Waters’ Y.M.C.A. program score card, 
245 
Wedemeyer, R., 136 
Weiss, R. A., 136 
Wendler's weighted strength tests, 151 


410 


Wetzel's grid technique, 53 

Wickens and Kiphuth's posture meas- 
urement, 49 

Width-weight tables, Pryor's, 54 

Wilson's weighted strength tests for 
women, 135 

Women's Army Corps, 172 


, Dale of Subjects 


ЖЧ Ше health behavior scale, 


X-ray, 44 


ы motor and physical fitness tests. 


Young, G., 196 


ee e 
di TW at MP a ы "get — " К EIN m: Жи 
oe? =" en 
| P К 
7 ? w AS ^ 


| А 
lb “== 
> 
» 


Form No. 3. 
PSY, RES.L.1 


Bureau of Educational & Psychological 
Research Library. 


——— 
The book is to be returned within 
the date stamped last, 


ан А ао tak 


WBGP-59/60.5] 19C-5M 


Form No. 4 
BOOK CARD 


Coll. хә) AGI. Acen. No 0.7. E 5 


Sie ER asl, -© dis afk, 
Tite. A rok ‚К.ч валы}. м. nil 


Date. Issued to | Returned on | 


KE 
Bov 


